Re: HOT chain validation in verify_heapam()

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Himanshu Upadhyaya <upadhyaya(dot)himanshu(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: HOT chain validation in verify_heapam()
Date: 2022-11-14 20:58:06
Message-ID: 20221114205806.c54z6e742scdgf33@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-11-14 09:50:48 -0800, Peter Geoghegan wrote:
> On Mon, Nov 14, 2022 at 9:38 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Anyway, I played a bit around with this. It's hard to hit, not because we
> > somehow won't choose such a horizon, but because we'll commonly prune the
> > earlier tuple version away due to xmax being old enough.
>
> That must be a bug, then. Since, as I said, I can't see how it could
> possibly be okay to freeze an xmin of tuple in a HOT chain without
> also making sure that it has no earlier versions left behind.

Hard to imagine us having bugs in this code. Ahem.

I really wish I knew of a reasonably complex way to utilize coverage guided
fuzzing on heap pruning / vacuuming.

I wonder if we ought to add an error check to heap_prepare_freeze_tuple()
against this scenario. We're working towards being more aggressive around
freezing, which will make it more likely to hit corner cases around this.

> If there are earlier versions that we have to go through to get to the
> frozen-xmin tuple (not just an LP_REDIRECT), we're going to break the HOT
> chain traversal logic in code like heap_hot_search_buffer in a rather
> obvious way.
>
> HOT chain traversal logic code will interpret the frozen xmin from the
> tuple as FrozenTransactionId (not as its raw xmin). So traversal is
> just broken in this scenario.
>

Which'd still be fine if the whole chain were already "fully dead". One way I
think this can happen is <= PG 13's HEAPTUPLE_DEAD handling in
lazy_scan_heap().

I now suspect that the seemingly-odd "We will advance past RECENTLY_DEAD
tuples just in case there's a DEAD one after them;" logic in
heap_prune_chain() might be required for correctness. Which IIRC we'd been
talking about getting rid elsewhere?

<tinkers>

At least as long as correctness requires not ending up in endless loops -
indeed. We end up with lazy_scan_prune() endlessly retrying. Without a chance
to interrupt. Shouldn't there at least be a CFI somewhere? The attached
isolationtester spec has a commented out test for this.

I think the problem partially is that the proposed verify_heapam() code is too
"aggressive" considering things to be part of the same hot chain - which then
means we have to be very careful about erroring out.

The attached isolationtester test triggers:
"unfrozen tuple was updated to produce a tuple at offset %u which is frozen"
"updated version at offset 3 is also the updated version of tuple at offset %u"

Despite there afaict not being any corruption. Worth noting that this happens
regardless of hot/non-hot updates being used (uncomment s3ci to see).

> > It *is* possible to
> > hit, if the horizon increases between the two tuple version checks (e.g. if
> > there's another tuple inbetween that we check the visibility of).
>
> I suppose that that's the detail that "protects" us, then -- that
> would explain the apparent lack of problems in the field. Your
> sequence requires 3 sessions, not just 2.

One important protection right now is that vacuumlazy.c uses a more
pessimistic horizon than pruneheap.c. Even if visibility determinations within
pruning recompute the horizon, vacuumlazy.c won't freeze based on the advanced
horizon. I don't quite know where we we'd best put a comment with a warning
about this fact.

Greetings,

Andres Freund

Attachment Content-Type Size
skewer.diff text/x-diff 2.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-11-14 21:03:25 Re: Add sub-transaction overflow status in pg_stat_activity
Previous Message Thomas Munro 2022-11-14 20:42:26 Re: Suppressing useless wakeups in walreceiver