Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

From: Andres Freund <andres(at)anarazel(dot)de>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Subject: Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE
Date: 2023-01-30 20:19:32
Message-ID: 20230130201932.6hhfyc35xr57zdb7@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-01-10 21:32:54 +0100, Matthias van de Meent wrote:
> On Tue, 10 Jan 2023 at 20:14, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2023-01-10 15:03:42 +0100, Matthias van de Meent wrote:
> > What precisely do you mean with "skew" here? Do you just mean that it'd take a
> > long time until vacuum_defer_cleanup_age takes effect? Somehow it sounds like
> > you might mean more than that?
>
> h->oldest_considered_running can be extremely old due to the global
> nature of the value and the potential existence of a snapshot in
> another database that started in parallel to a very old running
> transaction.

Here's a version that, I think, does not have that issue.

In an earlier, not posted, version I had an vacuum_defer_cleanup_age specific
helper function for this, but it seems likely we'll need it in other places
too. So I named it TransactionIdRetreatSafely(). I made it accept the xid by
pointer, as the line lengths / repetition otherwise end up making it hard to
read the code. For now I have TransactionIdRetreatSafely() be private to
procarray.c, but I expect we'll have to change that eventually.

Not sure I like TransactionIdRetreatSafely() as a name. Maybe
TransactionIdRetreatClamped() is better?

I've been working on a test for vacuum_defer_cleanup_age. It does catch the
corruption at hand, but not much more. It's quite painful to write, right
now. Some of the reasons:
https://postgr.es/m/20230130194350.zj5v467x4jgqt3d6%40awork3.anarazel.de

> > I'm tempted to go with reinterpreting 64bit xids as signed. Except that it
> > seems like a mighty invasive change to backpatch.
>
> I'm not sure either. Protecting against underflow by halving the
> effective valid value space is quite the intervention, but if it is
> necessary to make this work in a performant manner, it would be worth
> it. Maybe someone else with more experience can provide their opinion
> here.

The attached assertions just removes 1/2**32'ths of the space, by reserving
the xid range with the upper 32bit set as something that shouldn't be
reachable.

Still requires us to change the input routines to reject that range, but I
think that's a worthy tradeoff. I didn't find the existing limits for the
type to be documented anywhere.

Obviously something like that could only go into HEAD.

Greetings,

Andres Freund

Attachment Content-Type Size
v3-0001-WIP-Fix-corruption-due-to-vacuum_defer_cleanup_ag.patch text/x-diff 5.0 KB
v3-0002-WIP-very-incomplete-test-for-vacuum_defer_cleanup.patch text/x-diff 6.0 KB
v3-0003-wip-amcheck-Fix-bugs-relating-to-64bit-xids.patch text/x-diff 1.7 KB
v3-0004-wip-stricter-validation-for-64bit-xids.patch text/x-diff 12.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2023-01-30 20:23:21 Re: Add SHELL_EXIT_CODE to psql
Previous Message Tom Lane 2023-01-30 20:06:46 Re: Making background psql nicer to use in tap tests