Re: What is "wraparound failure", really?

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: What is "wraparound failure", really?
Date: 2021-06-27 23:23:25
Message-ID: 9b18e359-1183-a45b-4c99-ca93655edced@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 6/27/21 4:36 PM, Peter Geoghegan wrote:
> The wraparound failsafe mechanism added by commit 1e55e7d1 had minimal
> documentation -- just a basic description of how the GUCs work. I
> think that it certainly merits some discussion under "25.1. Routine
> Vacuuming" -- more specifically under "25.1.5. Preventing Transaction
> ID Wraparound Failures". One reason why this didn't happen in the
> original commit was that I just didn't know where to start with it.
> The docs in question have said this since 2006's commit 48188e16 first
> added autovacuum_freeze_max_age:
>
> "The sole disadvantage of increasing autovacuum_freeze_max_age (and
> vacuum_freeze_table_age along with it) is that the pg_xact and
> pg_commit_ts subdirectories of the database cluster will take more
> space..."
>
> This sentence seems completely unreasonable to me. It seems to just
> ignore the huge disadvantage of increasing autovacuum_freeze_max_age:
> the *risk* that the system will stop being able to allocate new XIDs
> because GetNewTransactionId() errors out with "database is not
> accepting commands to avoid wraparound data loss...". Sure, it's
> possible to take a lot of risk here without it ever blowing up in your
> face. And if it doesn't blow up then the downside really is zero. This
> is hardly a sensible way to talk about this important risk. Or any
> risk at all.
>
> At first I thought that the sentence was not just misguided -- it
> seemed downright bizarre. I thought that it was directly at odds with
> the title "Preventing Transaction ID Wraparound Failures". I thought
> that the whole point of this section was how not to have a wraparound
> failure (as I understand the term), and yet we seem to deliberately
> ignore the single most important practical aspect of making sure that
> that doesn't happen. But I now suspect that the basic definitions have
> been mixed up in a subtle but important way.
>
> What the documentation calls a "wraparound failure" seems to be rather
> different to what I thought that that meant. As I said, I thought that
> that meant the condition of being unable to get new transaction IDs
> (at least until the DBA runs VACUUM in single user mode). But the
> documentation in question seems to actually define it as "the
> condition of an old MVCC snapshot failing to see a version from the
> distant past, because somehow an XID wraparound suddenly makes it look
> as if it's in the distant future rather than in the past". It's
> actually talking about a subtly different thing, so the "sole
> disadvantage" sentence is not actually bizarre. It does still seem
> impractical and confusing, though.
>
> I strongly suspect that my interpretation of what "wraparound failure"
> means is actually the common one. Of course the system is never under
> any circumstances allowed to give totally wrong answers to queries, no
> matter what -- users should be able to take that much for granted.
> What users care about here is sensibly managing XIDs as a resource --
> preventing "XID exhaustion" while being conservative, but not
> ridiculously conservative. Could the documentation be completely
> misleading users here?
>
> I have two questions:
>
> 1. Do I have this right? Is there really confusion about what a
> "wraparound failure" means, or is the confusion mine alone?
>
> 2. How do I go about integrating discussion of the failsafe here?
> Anybody have thoughts on that?
>

AIUI, actual wraparound (i.e. an xid crossing the event horizon so it
appears to be in the future) is no longer possible. But it once was a
very real danger. Maybe the docs haven't quite caught up.

In practical terms, there is an awful lot of head room between the
default for autovacuum_freeze_max_age and any danger of major
anti-wraparound measures. Say you increase it to 1bn from the default
200m. That still leaves you ~1bn transactions of headroom.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2021-06-28 00:17:55 Re: Fix uninitialized copy_data var (src/backend/commands/subscriptioncmds.c)
Previous Message Justin Pryzby 2021-06-27 22:34:56 Re: pg14b2: FailedAssertion("_bt_posting_valid(nposting)", File: "nbtdedup.c", ...