Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE
Date: 2023-01-09 16:50:10
Message-ID: CAEze2WjBffSYbyfAHbWeMO7Ra64m2i9OLfAjd8k1n=KD1YvqrA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 8 Jan 2023 at 04:09, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2023-01-07 16:29:23 -0800, Andres Freund wrote:
> > It's probably not too hard to fix specifically in this one place - we could
> > just clamp vacuum_defer_cleanup_age to be smaller than latest_completed, but
> > it strikes me as as a somewhat larger issue for the 64it xid infrastructure. I
> > suspect this might not be the only place running into problems with such
> > "before the universe" xids.
>
> I haven't found other problematic places in HEAD, but did end up find a less
> serious version of this bug in < 14: GetFullRecentGlobalXmin(). I did verify
> that with vacuum_defer_cleanup_age set GetFullRecentGlobalXmin() returns
> values that look likely to cause problems. Its "just" used in gist luckily.
>
> It's hard to find places that do this kind of arithmetic, we traditionally
> haven't had a helper for it. So it's open-coded in various ways.
> [...]
>
> The currently existing places I found, other than the ones in procarray.c,
> luckily don't seem to convert the xids to 64bit xids.

That's good to know.

> > For a bit I was thinking that we should introduce the notion that a
> > FullTransactionId can be from the past. Specifically that when the upper 32bit
> > are all set, we treat the lower 32bit as a value from before xid 0 using the
> > normal 32bit xid arithmetic. But it sucks to add overhead for that
> > everywhere.
> >
> > It might be a bit more palatable to designate one individual value,
> > e.g. 2^32-1<<32, as a "before xid 0" indicator - it doesn't matter how far
> > before the start of the universe an xid point to...
>
> On IM Thomas suggested we could reserve the 2^32-1 epoch for invalid values. I
> hacked up a patch that converts various fxid functions to inline functions
> with such assertions, and it indeed quickly catches the problem this thread
> reported, close to the source of the use.

Wouldn't it be enough to only fix the constructions in
FullXidRelativeTo() and widen_snapshot_xid() (as attached, $topic does
not occur with the patch), and (optionally) bump the first XID
available for any cluster to (FirstNormalXid + 1) to retain the 'older
than any running transaction' property?

The change only fixes the issue for FullTransactionId, which IMO is OK
- I don't think we need to keep xid->xid8->xid symmetric in cases of
xid8 wraparound.

> One issue with that is is that it'd reduce what can be input for the xid8
> type. But it's hard to believe that'd be a real issue?

Yes, it's unlikely anyone would ever hit that with our current WAL
format - we use 24 bytes /xid just to log it's use, so we'd use at
most epoch 0x1000_0000 in unrealistic scenarios. In addition;
technically, we already have (3*2^32 - 3) "invalid" xid8 values that
can never be produced in FullXidRelativeTo - those few extra invalid
values don't matter much to me except "even more special casing".

Kind regards,

Matthias van de Meent.

Attachment Content-Type Size
v1-0001-Prevent-underflow-of-xid8-epoch.patch application/octet-stream 4.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ankit Kumar Pandey 2023-01-09 17:15:12 Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Previous Message Laurenz Albe 2023-01-09 16:40:01 Re: Make EXPLAIN generate a generic plan for a parameterized query