Re: pg_upgrade and wraparound

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, Alexander Shutyaev <shutyaev(at)gmail(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: pg_upgrade and wraparound
Date: 2018-06-11 17:29:25
Message-ID: 20180611172925.fqq3aj7wyrs5orym@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 2018-06-11 13:14:12 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > I suspect the issue is that pg_resetwal does:
> > if (set_xid != 0)
> > {
> > ControlFile.checkPointCopy.nextXid = set_xid;
>
> > /*
> > * For the moment, just set oldestXid to a value that will force
> > * immediate autovacuum-for-wraparound. It's not clear whether adding
> > * user control of this is useful, so let's just do something that's
> > * reasonably safe. The magic constant here corresponds to the
> > * maximum allowed value of autovacuum_freeze_max_age.
> > */
> > ControlFile.checkPointCopy.oldestXid = set_xid - 2000000000;
> > if (ControlFile.checkPointCopy.oldestXid < FirstNormalTransactionId)
> > ControlFile.checkPointCopy.oldestXid += FirstNormalTransactionId;
> > ControlFile.checkPointCopy.oldestXidDB = InvalidOid;
> > }
>
> > but we have codepath that doesn't check for oldestXidDB being
> > InvalidOid. Not great.
>
> Hm, I think I'd define the problem as "pg_resetwal is violating the
> expectation that oldestXidDB be valid".

Well, what could it do otherwise? ForceTransactionIdLimitUpdate()
currently does a syscache check for database existence. That'll just
return a lookup failure for InvalidOid, so we're reasonably good on that
front.

Using a hardcoded 2000000000 seems worse, will have funny results if
running with a smaller autovacuum_freeze_max_age...

> However, this just explains the basically-cosmetic issue that the
> complaint message mentions OID 0. It doesn't really get us to the
> answer to why Alexander is seeing a failure. It might be useful
> to see pg_controldata output for the old cluster, as well as
> "select datname, datfrozenxid from pg_database" output from the
> old cluster.

pg_upgrade starts the server with autovacuum disabled, I suspect
restoring all the large objects ends up using a lot of transaction
ids. GetNewTransactionId() should start autovacuum, but I'd guess that's
where things are going wrong for some reason.

Alexander, could you hack things up so autovacuum logging is enabled
(log_autovacuum_min_duration=0), and see whether it's triggered?

I'm not entirely clear why pg_restore appears to use a separate
transaction for each large object, surely exascerbating the problem.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alexey Dokuchaev 2018-06-11 18:21:00 Re: Catching unique_violation exception on specific column/index
Previous Message Tom Lane 2018-06-11 17:14:12 Re: pg_upgrade and wraparound