Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

From: Andres Freund <andres(at)anarazel(dot)de>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Subject: Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE
Date: 2023-01-09 19:34:26
Message-ID: 20230109193426.hcgn27ogtfuv7vyl@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Robert, Mark, CCing you because of the amcheck aspect (see below).

On 2023-01-09 17:50:10 +0100, Matthias van de Meent wrote:
> On Sun, 8 Jan 2023 at 04:09, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > For a bit I was thinking that we should introduce the notion that a
> > > FullTransactionId can be from the past. Specifically that when the upper 32bit
> > > are all set, we treat the lower 32bit as a value from before xid 0 using the
> > > normal 32bit xid arithmetic. But it sucks to add overhead for that
> > > everywhere.
> > >
> > > It might be a bit more palatable to designate one individual value,
> > > e.g. 2^32-1<<32, as a "before xid 0" indicator - it doesn't matter how far
> > > before the start of the universe an xid point to...
> >
> > On IM Thomas suggested we could reserve the 2^32-1 epoch for invalid values. I
> > hacked up a patch that converts various fxid functions to inline functions
> > with such assertions, and it indeed quickly catches the problem this thread
> > reported, close to the source of the use.
>
> Wouldn't it be enough to only fix the constructions in
> FullXidRelativeTo() and widen_snapshot_xid() (as attached, $topic does
> not occur with the patch), and (optionally) bump the first XID
> available for any cluster to (FirstNormalXid + 1) to retain the 'older
> than any running transaction' property?

It's not too hard to fix in individual places, but I suspect that we'll
introduce the bug in future places without some more fundamental protection.

Locally I fixed it by clamping vacuum_defer_cleanup_age to a reasonable value
in ComputeXidHorizons() and GetSnapshotData().

Fixing it in FullXidRelativeTo() doesn't seem quite right, while it's ok to
just return FirstNormalTransactionId in the case of vacuum_defer_cleanup_age,
it doesn't see necessarily correct for other cases.

> The change only fixes the issue for FullTransactionId, which IMO is OK
> - I don't think we need to keep xid->xid8->xid symmetric in cases of
> xid8 wraparound.

I think we should keep that symmetric, it just gets too confusing / easy to
miss bugs otherwise.

> > One issue with that is is that it'd reduce what can be input for the xid8
> > type. But it's hard to believe that'd be a real issue?
>
> Yes, it's unlikely anyone would ever hit that with our current WAL
> format - we use 24 bytes /xid just to log it's use, so we'd use at
> most epoch 0x1000_0000 in unrealistic scenarios. In addition;
> technically, we already have (3*2^32 - 3) "invalid" xid8 values that
> can never be produced in FullXidRelativeTo - those few extra invalid
> values don't matter much to me except "even more special casing".

Yep. The attached 0002 is a first implementation of this.

The new assertions found at least one bug in amcheck, and one further example
of the problem of representing past 32 xids in 64bit:

1) Because ctx->next_xid is set after the XidFromFullTransactionId() call in
update_cached_xid_range(), we end up using the xid 0 (or an outdated value in
subsequent calls) to determine whether epoch needs to be reduced.

2) One test generates includes an xid from the future (4026531839). Which
causes epoch to wrap around (via the epoch--) in
FullTransactionIdFromXidAndCtx(). I've hackily fixed that by just representing
it as an xid from the future instead. But not sure that's a good answer.

A different approach would be to represent fxids as *signed* 64bit
integers. That'd of course loose more range, but could represent everything
accurately, and would have a compatible on-disk representation on two's
complement platforms (all our platforms). I think the only place that'd need
special treatment is U64FromFullTransactionId() / its callers. I think this
might be the most robust approach.

Greetings,

Andres Freund

Attachment Content-Type Size
v2-0001-Fix-corruption-due-to-vacuum_defer_cleanup_age-un.patch text/x-diff 4.5 KB
v2-0002-wip-amcheck-Fix-bugs-relating-to-64bit-xids.patch text/x-diff 1.7 KB
v2-0003-wip-stricter-validation-for-64bit-xids.patch text/x-diff 12.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-01-09 19:44:43 Re: Fixing a couple of buglets in how VACUUM sets visibility map bits
Previous Message Tom Lane 2023-01-09 18:52:16 Re: [PATCH] random_normal function