Re: What is "wraparound failure", really?

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: What is "wraparound failure", really?
Date: 2021-06-27 23:23:25
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 6/27/21 4:36 PM, Peter Geoghegan wrote:
> The wraparound failsafe mechanism added by commit 1e55e7d1 had minimal
> documentation -- just a basic description of how the GUCs work. I
> think that it certainly merits some discussion under "25.1. Routine
> Vacuuming" -- more specifically under "25.1.5. Preventing Transaction
> ID Wraparound Failures". One reason why this didn't happen in the
> original commit was that I just didn't know where to start with it.
> The docs in question have said this since 2006's commit 48188e16 first
> added autovacuum_freeze_max_age:
> "The sole disadvantage of increasing autovacuum_freeze_max_age (and
> vacuum_freeze_table_age along with it) is that the pg_xact and
> pg_commit_ts subdirectories of the database cluster will take more
> space..."
> This sentence seems completely unreasonable to me. It seems to just
> ignore the huge disadvantage of increasing autovacuum_freeze_max_age:
> the *risk* that the system will stop being able to allocate new XIDs
> because GetNewTransactionId() errors out with "database is not
> accepting commands to avoid wraparound data loss...". Sure, it's
> possible to take a lot of risk here without it ever blowing up in your
> face. And if it doesn't blow up then the downside really is zero. This
> is hardly a sensible way to talk about this important risk. Or any
> risk at all.
> At first I thought that the sentence was not just misguided -- it
> seemed downright bizarre. I thought that it was directly at odds with
> the title "Preventing Transaction ID Wraparound Failures". I thought
> that the whole point of this section was how not to have a wraparound
> failure (as I understand the term), and yet we seem to deliberately
> ignore the single most important practical aspect of making sure that
> that doesn't happen. But I now suspect that the basic definitions have
> been mixed up in a subtle but important way.
> What the documentation calls a "wraparound failure" seems to be rather
> different to what I thought that that meant. As I said, I thought that
> that meant the condition of being unable to get new transaction IDs
> (at least until the DBA runs VACUUM in single user mode). But the
> documentation in question seems to actually define it as "the
> condition of an old MVCC snapshot failing to see a version from the
> distant past, because somehow an XID wraparound suddenly makes it look
> as if it's in the distant future rather than in the past". It's
> actually talking about a subtly different thing, so the "sole
> disadvantage" sentence is not actually bizarre. It does still seem
> impractical and confusing, though.
> I strongly suspect that my interpretation of what "wraparound failure"
> means is actually the common one. Of course the system is never under
> any circumstances allowed to give totally wrong answers to queries, no
> matter what -- users should be able to take that much for granted.
> What users care about here is sensibly managing XIDs as a resource --
> preventing "XID exhaustion" while being conservative, but not
> ridiculously conservative. Could the documentation be completely
> misleading users here?
> I have two questions:
> 1. Do I have this right? Is there really confusion about what a
> "wraparound failure" means, or is the confusion mine alone?
> 2. How do I go about integrating discussion of the failsafe here?
> Anybody have thoughts on that?

AIUI, actual wraparound (i.e. an xid crossing the event horizon so it
appears to be in the future) is no longer possible. But it once was a
very real danger. Maybe the docs haven't quite caught up.

In practical terms, there is an awful lot of head room between the
default for autovacuum_freeze_max_age and any danger of major
anti-wraparound measures. Say you increase it to 1bn from the default
200m. That still leaves you ~1bn transactions of headroom.



Andrew Dunstan

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2021-06-28 00:17:55 Re: Fix uninitialized copy_data var (src/backend/commands/subscriptioncmds.c)
Previous Message Justin Pryzby 2021-06-27 22:34:56 Re: pg14b2: FailedAssertion("_bt_posting_valid(nposting)", File: "nbtdedup.c", ...