Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

From: Andres Freund <andres(at)anarazel(dot)de>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Subject: Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE
Date: 2023-01-31 22:38:04
Message-ID: 20230131223804.o7wlonfqlif2rcdo@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-01-31 15:05:17 +0100, Matthias van de Meent wrote:
> On Mon, 30 Jan 2023 at 21:19, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > In an earlier, not posted, version I had an vacuum_defer_cleanup_age specific
> > helper function for this, but it seems likely we'll need it in other places
> > too. So I named it TransactionIdRetreatSafely(). I made it accept the xid by
> > pointer, as the line lengths / repetition otherwise end up making it hard to
> > read the code. For now I have TransactionIdRetreatSafely() be private to
> > procarray.c, but I expect we'll have to change that eventually.
>
> If TransactionIdRetreatSafely will be exposed outside procarray.c,
> then I think the xid pointer should be replaced with normal
> arguments/returns; both for parity with TransactionIdRetreatedBy

That's why I named one version *Retreat the other Retreated :)

I think it'll make the code easier to read in the other places too, the
variable names / function names in this space are uncomfortably long to
fit into 78chars..., particularly when there's two references to the
same variable in the same line.

> and to remove this memory store dependency in this hot code path.

I doubt that matters much here and the places it's going to be used
in. And presumably the compiler will inline it anyway. I'd probably make
it a static inline in the header too.

What's making me hesitate about exposing it is that it's quite easy to
get things wrong by using a wrong fxid or such.

> > + /*
> > + * FIXME, doubtful this is the best fix.
> > + *
> > + * Can't represent the 32bit xid as a 64bit xid, as it's before fxid
> > + * 0. Represent it as an xid from the future instead.
> > + */
> > + if (epoch == 0)
> > + return FullTransactionIdFromEpochAndXid(0, xid);
>
> Shouldn't this be an error condition instead, as this XID should not
> be able to appear?

If you mean error in the sense of ERROR, no, I don't think so. That code
tries hard to be able to check many tuples in a row. And if we were to
error out here, we'd not able to do that. We should still report those
tuples as corrupt, fwiw.

The reason this path is hit is that a test intentionally corrupts some
xids. So the path is reachable and we need to cope somehow.

I'm not really satisfied with this fix either - I mostly wanted to
include something sufficient to prevent assertion failures.

I had hoped that Mark would look at the amcheck bits and come up with
more complete fixes.

> on 0004:
>
> > - '0xffffffffffffffff'::xid8,
> > - '-1'::xid8;
> > + '0xefffffffffffffff'::xid8,
> > + '0'::xid8;
>
> The 0xFF... usages were replaced with "0xEFFF...". Shouldn't we also
> test on 0xffff_fffE_ffff_ffff to test for input of our actual max
> value?

Probably a good idea.

> > @@ -326,7 +329,11 @@ parse_snapshot(const char *str, Node *escontext)
> > while (*str != '\0')
> > {
> > /* read next value */
> > - val = FullTransactionIdFromU64(strtou64(str, &endp, 10));
> > + raw_fxid = strtou64(str, &endp, 10);
> > +
> > + val = FullTransactionIdFromU64(raw_fxid);
> > + if (!InFullTransactionIdRange(raw_fxid))
> > + goto bad_format;
>
> With assertions enabled FullTransactionIdFromU64 will assert the
> InFullTransactionIdRange condition, meaning we wouldn't hit the branch
> into bad_format.
> I think these operations should be swapped, as parsing a snapshot
> shouldn't run into assertion failures like this if it can error
> normally.

Yep.

> Maybe this can be added to tests as well?

I'll check. I thought for a bit it'd not work because we'd perform range
checks on the xids, but we don't...

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2023-01-31 22:47:35 Re: Show various offset arrays for heap WAL records
Previous Message Joel Jacobson 2023-01-31 21:59:10 Re: [PATCH] Fix old thinko in formula to compute sweight in numeric_sqrt().