Re: HOT chain validation in verify_heapam()

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Himanshu Upadhyaya <upadhyaya(dot)himanshu(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: HOT chain validation in verify_heapam()
Date: 2022-11-14 22:42:16
Message-ID: CAH2-WznBBTm10chJN-OjPJHLEYOsNXUdh8kW0uV4vfHcR3K2TQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 14, 2022 at 2:33 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Why don't you think that there is corruption?
>
> I looked at the state after the test and the complaint is bogus. It's caused
> by the patch ignoring the cur->xmax == next->xmin condition if next->xmin is
> FrozenTransactionId. The isolationtester test creates a situation where that
> leads to verify_heapam() considering tuples to be part of the same chain even
> though they aren't.

Having looked at your isolation test in more detail, it seems like you
were complaining about a fairly specific and uncontroversial
shortcoming in the patch itself: it complains about a newly inserted
tuple that gets frozen. It thinks that the inserted tuple is part of
the same HOT chain (or at least the same update chain) as other tuples
on the same heap page, when in fact it's just some wholly unrelated
tuple/logical row. It seems as if the new amcheck code doesn't get all
the details of validating HOT chains right, and so jumps the gun here,
reporting corruption based on a faulty assumption that the frozen-xmin
tuple is in any way related to the chain.

I was confused about whether we were talking about this patch, bugs in
HEAD, or both.

> > Maybe you're right about the proposed new functionality getting things wrong
> > with your adversarial isolation test, but I seem to have missed the
> > underlying argument. Are you just talking about regular update chains here,
> > not HOT chains? Something else?
>
> As I noted, it happens regardless of HOT being used or not. The tuples aren't
> part of the same chain, but the patch treats them as if they were. The reason
> the patch considers them to be part of the same chain is precisely the
> FrozenTransactionId condition I was worried about. Just because t_ctid points
> to a tuple on the same page and the next tuple has xmin ==
> FrozenTransactionId, doesn't mean they're part of the same chain. Once you
> encounter a tuple with a frozen xmin you simply cannot assume it's part of the
> chain you've been following.

Got it.

That seems relatively straightforward and uncontroversial to me,
because it's just how code like heap_hot_search_buffer (HOT chain
traversal code) works already. The patch got some of those details
wrong, and should be revised.

What does this have to tell us, if anything, about the implications
for code on HEAD? I don't see any connection between this problem and
the possibility of a live bug on HEAD involving freezing later tuple
versions in a HOT chain, leaving earlier non-frozen versions behind to
break HOT chain traversal code. Should I have noticed such a
connection?

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-11-14 22:58:10 Re: HOT chain validation in verify_heapam()
Previous Message Andrew Dunstan 2022-11-14 22:41:54 meson oddities