Re: pg_upgrade can result in early wraparound on databases with high transaction load

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Jason Harvey <jason(at)reddit(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Tharakan, Robins" <thararamazoncom(at)telsasoft(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_upgrade can result in early wraparound on databases with high transaction load
Date: 2021-04-24 00:28:27
Message-ID: 20210424002827.GO7256@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Fri, Apr 23, 2021 at 04:42:56PM -0700, Andres Freund wrote:
> Hi,
>
> On 2019-06-15 11:37:59 -0700, Noah Misch wrote:
> > On Tue, May 21, 2019 at 03:23:00PM -0700, Peter Geoghegan wrote:
> > > On Mon, May 20, 2019 at 3:10 AM Jason Harvey <jason(at)reddit(dot)com> wrote:
> > > > This week I upgraded one of my large(2.8TB), high-volume databases from 9 to 11. The upgrade itself went fine. About two days later, we unexpectedly hit transaction ID wraparound. What was perplexing about this was that the age of our oldest `datfrozenxid` was only 1.2 billion - far away from where I'd expect a wraparound. Curiously, the wraparound error referred to a mysterious database of `OID 0`:
> > > >
> > > > UPDATE ERROR: database is not accepting commands to avoid wraparound data loss in database with OID 0
> >
> > That's bad.
>
> Yea. The code triggering it in pg_resetwal is bogus as far as I can
> tell. That pg_upgrade triggers it makes this quite bad.
>
> I just hit issues related to it when writing a wraparound handling
> test. Peter remembered this issue (how?)...
>
> Especially before 13 (inserts triggering autovacuum) it is quite common
> to have tables that only ever get vacuumed due to anti-wraparound
> vacuums. And it's common for larger databases to increase
> autovacuum_freeze_max_age. Which makes it fairly likely for this to
> guess an oldestXid value that's *newer* than an accurate one. Since
> oldestXid is used in a few important-ish places (like triggering
> vacuums, and in 14 also some snapshot related logic) I think that's bad.
>
> The relevant code:
>
> if (set_xid != 0)
> {
> ControlFile.checkPointCopy.nextXid =
> FullTransactionIdFromEpochAndXid(EpochFromFullTransactionId(ControlFile.checkPointCopy.nextXid),
> set_xid);
>
> /*
> * For the moment, just set oldestXid to a value that will force
> * immediate autovacuum-for-wraparound. It's not clear whether adding
> * user control of this is useful, so let's just do something that's
> * reasonably safe. The magic constant here corresponds to the
> * maximum allowed value of autovacuum_freeze_max_age.
> */
> ControlFile.checkPointCopy.oldestXid = set_xid - 2000000000;
> if (ControlFile.checkPointCopy.oldestXid < FirstNormalTransactionId)
> ControlFile.checkPointCopy.oldestXid += FirstNormalTransactionId;
> ControlFile.checkPointCopy.oldestXidDB = InvalidOid;
> }
>
> Originally from:
>
> commit 25ec228ef760eb91c094cc3b6dea7257cc22ffb5
> Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Date: 2009-08-31 02:23:23 +0000
>
> Track the current XID wrap limit (or more accurately, the oldest unfrozen
> XID) in checkpoint records. This eliminates the need to recompute the value
> from scratch during database startup, which is one of the two remaining
> reasons for the flatfile code to exist. It should also simplify life for
> hot-standby operation.
>
> I think we should remove the oldestXid guessing logic, and expose it as
> an explicit option. I think it's important that pg_upgrade sets an
> accurate value. Probably not worth caring about oldestXidDB though?

This (combination of) thread(s) seems relevant.

Subject: pg_upgrade failing for 200+ million Large Objects
https://www.postgresql.org/message-id/flat/12601596dbbc4c01b86b4ac4d2bd4d48%40EX13D05UWC001.ant.amazon.com
https://www.postgresql.org/message-id/flat/a9f9376f1c3343a6bb319dce294e20ac%40EX13D05UWC001.ant.amazon.com
https://www.postgresql.org/message-id/flat/cc089cc3-fc43-9904-fdba-d830d8222145%40enterprisedb.com#3eec85391c6076a4913e96a86fece75e
> Allows the user to provide a constant via pg_upgrade command-line, that
>overrides the 2 billion constant in pg_resetxlog [1] thereby increasing the
>(window of) Transaction IDs available for pg_upgrade to complete.

--
Justin

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andres Freund 2021-04-24 01:00:05 Re: pg_upgrade can result in early wraparound on databases with high transaction load
Previous Message Tomas Vondra 2021-04-24 00:06:17 Re: BRIN index on timestamptz

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-04-24 00:29:21 Re: Testing autovacuum wraparound (including failsafe)
Previous Message Andres Freund 2021-04-23 23:42:56 Re: pg_upgrade can result in early wraparound on databases with high transaction load