Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

From: Andres Freund <andres(at)anarazel(dot)de>
To: Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE
Date: 2023-01-08 00:29:23
Message-ID: 20230108002923.cyoser3ttmt63bfn@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thomas, CCing you because of the 64bit xid representation aspect.

On 2023-01-06 00:39:44 +0300, Michail Nikolaev wrote:
> I apologize for the direct ping, but I think your snapshot scalability
> work in PG14 could be related to the issue.

Good call!

> The TransactionIdRetreatedBy implementation looks correct... But with
> txid_current=212195 I see errors like "could not access status of
> transaction 58643736"...
> So, maybe vacuum_defer_cleanup_age just highlights some special case
> (something with "previous" xids on the left side of zero?)....

I think the bug is close to TransactionIdRetreatedBy(). Arguably in
FullXidRelativeTo(). Or a more fundamental data representation issue with
64bit xids.

To explain, here's a trace to the bottom of GetSnapshotData() leading to the
problem:

In the case I'm looking at here we end up with 720:
oldestfxid = FullXidRelativeTo(latest_completed, oldestxid);
and xmin is 255271, both correct.

Then in TransactionIdRetreatedBy:
/* apply vacuum_defer_cleanup_age */
def_vis_xid_data =
TransactionIdRetreatedBy(xmin, vacuum_defer_cleanup_age);

things start to be iffy. Because we retreat by vacuum_defer_cleanup_age, which
was set to txid_current() in scheme.sql, and the xmin above is that xid,
TransactionIdRetreatedBy() first ends up with 0. It then backtracks further to
the highest 32 xid (i.e. 4294967295). So far so good.

We could obviously end up with values further in the past as well, if
vacuum_defer_cleanup_age were larger.

Things start to seriously go off the rails when we convert that 32bit xid to
64 bit with:
def_vis_fxid = FullXidRelativeTo(latest_completed, def_vis_xid);
which returns {value = 18446744073709551615}, which 0-1 in 64bit.

However, as 64bit xids are not supposed to wrap around, we're in trouble -
it's an xid *very* far into the future. Allowing things to be pruned that
shouldn't, because everything is below that.

I don't quite know how to best fix this. 4294967295 is the correct result for
TransactionIdRetreatedBy() in this case, and it'd not cause problems for
FullXidRelativeTo() if we actually had wrapped around the xid counter before
(since then we'd just represent it as a fxid "of the proper epoch").

Basically, in contrast to 32bit xids, 64bit xids cannot represent xids from
before the start of the universe, whereas with the modulo arithmetic of 32bit
that's not a real problem.

It's probably not too hard to fix specifically in this one place - we could
just clamp vacuum_defer_cleanup_age to be smaller than latest_completed, but
it strikes me as as a somewhat larger issue for the 64it xid infrastructure. I
suspect this might not be the only place running into problems with such
"before the universe" xids.

For a bit I was thinking that we should introduce the notion that a
FullTransactionId can be from the past. Specifically that when the upper 32bit
are all set, we treat the lower 32bit as a value from before xid 0 using the
normal 32bit xid arithmetic. But it sucks to add overhead for that
everywhere.

It might be a bit more palatable to designate one individual value,
e.g. 2^32-1<<32, as a "before xid 0" indicator - it doesn't matter how far
before the start of the universe an xid point to...

FullXidRelativeTo() did assert that the input 32bit xid is in a reasonable
range, but unfortunately didn't do a similar check for the 64bit case.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2023-01-08 00:33:38 Re: drop postmaster symlink
Previous Message Peter Geoghegan 2023-01-07 23:41:29 Re: pgsql: Delay commit status checks until freezing executes.