Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Date: 2022-03-31 00:50:42
Message-ID: CAH2-Wzn_LLtTk-jGranhLcciSt_70pFKJNngZOf8Gt_MNgTmnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 30, 2022 at 12:01 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Perhaps something is amiss inside vac_update_relstats(), where the
> boolean flag that indicates that pg_class.relfrozenxid was advanced is
> set:
>
> if (frozenxid_updated)
> *frozenxid_updated = false;
> if (TransactionIdIsNormal(frozenxid) &&
> pgcform->relfrozenxid != frozenxid &&
> (TransactionIdPrecedes(pgcform->relfrozenxid, frozenxid) ||
> TransactionIdPrecedes(ReadNextTransactionId(),
> pgcform->relfrozenxid)))
> {
> if (frozenxid_updated)
> *frozenxid_updated = true;
> pgcform->relfrozenxid = frozenxid;
> dirty = true;
> }
>
> Maybe the "existing relfrozenxid is in the future, silently update
> relfrozenxid" part of the condition (which involves
> ReadNextTransactionId()) somehow does the wrong thing here. But how?

I tried several times to recreate this issue on CI. No luck with that,
though -- can't get it to fail again after 4 attempts.

This was a VACUUM of pg_database, run from an autovacuum worker. I am
vaguely reminded of the two bugs fixed by Andres in commit a54e1f15.
Both were issues with the shared relcache init file affecting shared
and nailed catalog relations. Those bugs had symptoms like " ERROR:
found xmin ... from before relfrozenxid ..." for various system
catalogs.

We know that this particular assertion did not fail during the same VACUUM:

Assert(vacrel->NewRelfrozenXid == OldestXmin ||
TransactionIdPrecedesOrEquals(vacrel->relfrozenxid,
vacrel->NewRelfrozenXid));

So it's hard to see how this could be a bug in the patch -- the final
new relfrozenxid is presumably equal to VACUUM's OldestXmin in the
problem scenario seen on the CI Windows instance yesterday (that's why
this earlier assertion didn't fail). The assertion I'm showing here
needs the "vacrel->NewRelfrozenXid == OldestXmin" part of the
condition to account for the fact that
OldestXmin/GetOldestNonRemovableTransactionId() is known to "go
backwards". Without that the regression tests will fail quite easily.

The surprising part of the CI failure must have taken place just after
this assertion, when VACUUM's call to vacuum_set_xid_limits() actually
updates pg_class.relfrozenxid with vacrel->NewRelfrozenXid --
presumably because the existing relfrozenxid appeared to be "in the
future" when we examine it in pg_class again. We see evidence that
this must have happened afterwards, when the closely related assertion
(used only in instrumentation code) fails:

From my patch:

> if (frozenxid_updated)
> {
> - diff = (int32) (FreezeLimit - vacrel->relfrozenxid);
> + diff = (int32) (vacrel->NewRelfrozenXid - vacrel->relfrozenxid);
> + Assert(diff > 0);
> appendStringInfo(&buf,
> _("new relfrozenxid: %u, which is %d xids ahead of previous value\n"),
> - FreezeLimit, diff);
> + vacrel->NewRelfrozenXid, diff);
> }

Does anybody have any ideas about what might be going on here?

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-03-31 02:00:18 Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Previous Message Andres Freund 2022-03-31 00:26:18 Re: Mingw task for Cirrus CI