Re: HOT chain validation in verify_heapam()

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Himanshu Upadhyaya <upadhyaya(dot)himanshu(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: HOT chain validation in verify_heapam()
Date: 2022-11-10 00:15:12
Message-ID: 20221110001512.vvypgmbfcxrsewel@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-11-09 15:03:39 -0800, Peter Geoghegan wrote:
> > > + /*
> > > + * Add a line pointer offset to the predecessor array if xmax is
> > > + * matching with xmin of next tuple (reaching via its t_ctid).
> > > + * Prior to PostgreSQL 9.4, we actually changed the xmin to
> > > + * FrozenTransactionId
> >
> > I'm doubtful it's a good idea to try to validate the 9.4 case. The likelihood
> > of getting that right seems low and I don't see us gaining much by even trying.
>
> This is the kind of comment that I'd usually agree with, but I
> disagree in this instance because of special considerations that apply
> to amcheck (or should IMV apply, at least). We're living in a world
> where we have to assume that the pre-9.4 format can occur in the
> field. If we can't get it right in amcheck, what chance do we have
> with other new code that tickles the same areas? I think that we need
> to support obsolescent heapam representations (both
> FrozenTransactionId and xvac) here on general principle.

To me this is extending the problem into more areas rather than reducing
it. I'd have *zero* confidence in any warnings that amcheck issued that
involved <9.4 special cases.

We've previously discussed adding pg_class column tracking the PG version that
last scanned the whole relation. We really should get to that one of these
decades :(.

> Besides, why not accept some small chance of getting this wrong? The
> worst that can happen is that we'll have a relatively benign bug. If
> we get it wrong then it's a short term problem, but also an
> opportunity to be less wrong in the future -- including in places
> where the consequences of being wrong are much more serious.

I think it doesn't just affect the < 9.4 path, but also makes us implement
things differently for >= 9.4. And we loose some accuracy due to that.

> > I doubt it is correct to enter this path with next_xmin ==
> > FrozenTransactionId. This is following a ctid chain that we normally wouldn't
> > follow, because it doesn't satisfy the t_self->xmax == t_ctid->xmin condition.
>
> We should never see FrozenTransactionId in an xmax field (nor should
> it be returned by HeapTupleHeaderGetUpdateXid() under any
> circumstances).

The field we check for FrozenTransactionId in the code I was quoting is the
xmin of the follower tuple. We follow the chain if either
cur->xmax == next->xmin or if next->xmin == FrozenTransactionId

What I'm doubting is the FrozenTransactionId path.

> Anyway, it follows that we cannot expect "next_xmin ==
> FrozenTransactionId", because that would mean that we'd called
> HeapTupleHeaderGetUpdateXid() which returned FrozenTransactionId -- an
> impossibility. (Maybe we should be checking that it really is an
> impossibility by checking the HeapTupleHeaderGetUpdateXid() return
> value, but that should be enough.)

next_xmin is acquired via HeapTupleHeaderGetXmin(next_htup), not
HeapTupleHeaderGetUpdateXid(cur_typ).

> > I don't immediately see what prevents the frozen tuple being from an entirely
> > different HOT chain than the two tuples pointing to it.
>
> In my view it simply isn't possible for a valid HOT chain to be in
> this state in the first place. So by definition it wouldn't be a HOT
> chain.

We haven't done any visibility checking at this point and my whole point is
that there's no guarantee that the pointed-to tuple actually belongs to the
same hot chain, given that we follow as soon as "xmin == FrozenXid". So the
pointing tuple might be an orphaned tuple.

> That would be a form of corruption, which is something that
> would probably be detected by noticing orphaned heap-only tuples
> (heap-only tuples not reachable from some root item on the same page,
> or some other intermediary heap-only tuple reachable from a root
> item).

You're saying that there's no way that there's a tuple pointing to another
tuple on the same page, which the pointed-to tuple belonging to a different
HOT chain?

I'm fairly certain that that at least used to be possible, and likely is still
possible. Isn't that pretty much what you'd expect to happen if there's
concurrent aborts leading to abandoned hot chains?

> > Can't we have a an update chain that is e.g.
> > xmin 10, xmax 5 -> xmin 5, xmax invalid
> >
> > and a vacuum cutoff of 7? That'd preent the first tuple from being removed,
> > but would allow 5 to be frozen.
>
> I don't see how that can be possible. That is contradictory, and
> cannot possibly work, since it supposes a situation where every
> possible MVCC snapshot sees the update that generated the
> second/successor tuple as committed, while at the same time also
> somehow needing the original tuple to stay in place. Surely both
> things can never be true at the same time.

The xmin horizon is very coarse grained. Just because it is 7 doesn't mean
that xid 10 is still running. All it means that one backend or slot has an
xmin or xid of 7.

s1: acquire xid 5
s2: acquire xid 7
s3: acquire xid 10

s3: insert
s3: commit
s1: update
s1: commit

s2: get a new snapshot, xmin 7 (or just hold no snapshot)

At this point the xmin horizon is 7. The first tuple's xmin can't be
frozen. The second tuple's xmin can be.

Note that indeed no backend could actually see the first tuple - xid 7's
snapshot won't have it marked as running, therefore it will be invisible. But
we will think the first tuple is recently dead rather than burried deeply,
because the xmin horizon is only 7.

> I believe you're right that an update chain that looks like this one
> is possible. However, I don't think it's possible for
> OldestXmin/FreezeLimit to take on a value like that (i.e. a value that
> "skewers" the update chain like this, the value 7 from your example).
> We ought to be able to rely on an OldestXmin value that can never let
> such a situation emerge. Right?

I don't see anything that'd guarantee that currently, nor do immediately see a
possible way to get there.

What do you think prevents such an OldestXmin?

I might be missing something myself...

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-11-10 00:25:48 Re: Out-of-date clang/llvm version lists in PGAC_LLVM_SUPPORT
Previous Message David G. Johnston 2022-11-09 23:03:42 Re: [DOCS] Stats views and functions not in order?