Re: pg_upgrade failing for 200+ million Large Objects

From: "Tharakan, Robins" <tharar(at)amazon(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_upgrade failing for 200+ million Large Objects
Date: 2021-03-08 11:02:04
Message-ID: a9f9376f1c3343a6bb319dce294e20ac@EX13D05UWC001.ant.amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks Peter.

The original email [1] had some more context that somehow didn't get
associated with this recent email. Apologies for any confusion.

In short, pg_resetxlog (and pg_resetwal) employs a magic constant [2] (for
both v9.6 as well as master) which seems to have been selected to force an
aggressive autovacuum as soon as the upgrade completes. Although that works
as planned, it narrows the window of Transaction IDs available for the
upgrade (before which XID wraparound protection kicks and aborts the
upgrade) to 146 Million.

Reducing this magic constant allows a larger XID window, which is what the
patch is trying to do. With the patch, I was able to upgrade a cluster with
500m Large Objects successfully (which otherwise reliably fails). In the
original email [1] I had also listed a few other possible workarounds, but
was unsure which would be a good direction to start working on.... thus this
patch to make a start.

Reference:
1) https://www.postgresql.org/message-id/12601596dbbc4c01b86b4ac4d2bd4d48%40
EX13D05UWC001.ant.amazon.com
2) https://github.com/postgres/postgres/blob/master/src/bin/pg_resetwal/pg_r
esetwal.c#L444

-
robins | tharar@ | syd12

> -----Original Message-----
> From: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
> Sent: Monday, 8 March 2021 9:25 PM
> To: Tharakan, Robins <tharar(at)amazon(dot)com>; pgsql-hackers(at)postgresql(dot)org
> Subject: [EXTERNAL] [UNVERIFIED SENDER] Re: pg_upgrade failing for 200+
> million Large Objects
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and
> know the content is safe.
>
>
>
> On 07.03.21 09:43, Tharakan, Robins wrote:
> > Attached is a proof-of-concept patch that allows Postgres to perform
> > pg_upgrade if the instance has Millions of objects.
> >
> > It would be great if someone could take a look and see if this patch
> > is in the right direction. There are some pending tasks (such as
> > documentation / pg_resetxlog vs pg_resetwal related changes) but for
> > now, the patch helps remove a stalemate where if a Postgres instance
> > has a large number (accurately speaking 146+ Million) of Large
> > Objects, pg_upgrade fails. This is easily reproducible and besides
> > deleting Large Objects before upgrade, there is no other (apparent) way
> for pg_upgrade to complete.
> >
> > The patch (attached):
> > - Applies cleanly on REL9_6_STABLE -
> > c7a4fc3dd001646d5938687ad59ab84545d5d043
> > - 'make check' passes
> > - Allows the user to provide a constant via pg_upgrade command-line,
> > that overrides the 2 billion constant in pg_resetxlog [1] thereby
> > increasing the (window of) Transaction IDs available for pg_upgrade to
> complete.
>
> Could you explain what your analysis of the problem is and why this patch
> (might) fix it?
>
> Right now, all I see here is, pass a big number via a command-line option
> and hope it works.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2021-03-08 11:06:27 Re: [Patch] ALTER SYSTEM READ ONLY
Previous Message Tharakan, Robins 2021-03-08 11:00:43 RE: pg_upgrade failing for 200+ million Large Objects