Re: pg_upgrade and wraparound

From: Jan Wieck <jan(at)wi3ck(dot)info>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, Alexander Shutyaev <shutyaev(at)gmail(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: pg_upgrade and wraparound
Date: 2021-03-12 23:13:33
Message-ID: f09a8c8e-b936-9878-4665-f21d4009e034@wi3ck.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Resurrecting an old thread.

We (AWS) have seen this wraparound during pg_upgrade more often recently
with customers who have millions of large objects in their databases.

On 6/11/18 1:14 PM, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
>> I suspect the issue is that pg_resetwal does:
>> if (set_xid != 0)
>> {
>> ControlFile.checkPointCopy.nextXid = set_xid;
>
>> /*
>> * For the moment, just set oldestXid to a value that will force
>> * immediate autovacuum-for-wraparound. It's not clear whether adding
>> * user control of this is useful, so let's just do something that's
>> * reasonably safe. The magic constant here corresponds to the
>> * maximum allowed value of autovacuum_freeze_max_age.
>> */
>> ControlFile.checkPointCopy.oldestXid = set_xid - 2000000000;
>> if (ControlFile.checkPointCopy.oldestXid < FirstNormalTransactionId)
>> ControlFile.checkPointCopy.oldestXid += FirstNormalTransactionId;
>> ControlFile.checkPointCopy.oldestXidDB = InvalidOid;
>> }
>
>> but we have codepath that doesn't check for oldestXidDB being
>> InvalidOid. Not great.
>
> Hm, I think I'd define the problem as "pg_resetwal is violating the
> expectation that oldestXidDB be valid".
>
> However, this just explains the basically-cosmetic issue that the
> complaint message mentions OID 0. It doesn't really get us to the
> answer to why Alexander is seeing a failure. It might be useful
> to see pg_controldata output for the old cluster, as well as
> "select datname, datfrozenxid from pg_database" output from the
> old cluster.

Unfortunately I don't have pg_controldata output from the old clusters
either. I would like to be able to artificially create an "old" cluster
that fails during pg_upgrade in that way.

One of the things in my way is that when using pg_resetwal to put the
NextXID way into the future (to push the old cluster close to wraparound
for example), the postmaster won't start because it doesn't have the
pg_xact files for that around. Should pg_resetwal create the files in
the gap between the old NextXID and the new one?

Onw thing I do have is a patch that provides a workaround for the
problem as well as a substantial speed improvement for the case at hand.
This patch adds some options to pg_upgrade, pg_dump and pg_restore.

Option added to pg_dump:

--blob-in-parallel

This option requires --schema-only. It causes pg_dump to emit the BLOB
metadata with SECTION_DATA instead of SECTION_PRE_DATA. This causes the
statements for creating the large object metadata (lo_create(OID) and
ALTER LARGE OBJECT) to move into the parallel phase of pg_restore, which
means that their metadata will be created in parallel. In my tests a
database containing large objects only is upgraded in 1/#cores the time.

Option added to pg_restore:

--blob-batch-size=N

With this option pg_restore tries to put N BLOB TOC entries into one
transaction. This is per parallel worker and it will commit those
batches if there is a change in object type, so only BLOB TOC entries
will ever be batched at all. With a sufficient
'max_locks_per_transation' a --blob-batch-size=1000 nicely reduces the
number of XIDs consumed for upgrading 10M large objects from 20M to 10K.

Options added to pg_upgrade:

--blob-in-parallel forwarded to pg_dump
--blob-batch-size=N forwarded to pg_restore
--restore-jobs=N forwarded as --jobs=N to pg_restore

Patch is attached.

Regards, Jan

--
Jan Wieck
Principle Database Engineer
Amazon Web Services

Attachment Content-Type Size
pg_upgrade_improvements.v1.diff text/x-patch 12.6 KB

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Paquier 2021-03-13 01:30:44 Re: pg_upgrade and wraparound
Previous Message Brent Wood 2021-03-12 20:44:12 Re: hstore each() function - returned order??