Re: amcheck's verify_heapam(), and HOT chain verification

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: amcheck's verify_heapam(), and HOT chain verification
Date: 2021-11-06 03:18:30
Message-ID: 4A85CA3D-DCDD-494C-9D79-99883610AF50@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Nov 5, 2021, at 7:51 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> I recently pushed a commit which formalized what we expect of HOT
> chains during pruning -- see assertions and comments added by commit
> 5cd7eb1f, "Add various assertions to heap pruning code". I notice that
> verify_heapam() doesn't really do anything with HOT chains, and I'd
> say that that's a gap. Fortunately it should not be terribly difficult
> to fill in this gap.
>
> The short version is this: I think that verify_heapam() should
> validate whole HOT chains, not just individual heap-only tuples. This
> can be accomplished by taking a similar approach to the approach taken
> in heap_prune_chain() already, for pruning.
>
> For a given heap page, pruning fills in its PruneState variable to
> build a coherent picture of the state of all HOT chains on the entire
> page, before modifying anything (barring hint bit setting). We only
> reach heap_page_prune_execute() when we have the whole picture about
> HOT chains. This high level approach is essential when pruning.
> Consider LP_REDIRECT items. We might need to set an existing
> LP_REDIRECT item to LP_DEAD, rendering an entire HOT chain dead.
> Whether or not that's the appropriate course of action for a given
> LP_REDIRECT item will depend on the state of the HOT chain as a whole.
>
> The HOT chain could of course have several heap-only tuples, and only
> one visible tuple means that the LP_REDIRECT root item must not be set
> LP_DEAD -- it should be made to link to the remaining live heap-only
> tuple instead (while the dead intermediate heap-only tuples become
> LP_UNUSED, never LP_DEAD). This is made even more tricky by the fact
> that we have to consider HOT updaters with aborted transactions during
> pruning.
>
> We even have to consider the case where a HOT updater aborts, but
> another HOT updater tries again and commits. Now we have a heap-only
> tuple that is "disconnected from its original HOT chain" (or will be
> until the next prune). Having a stray/disconnected heap-only tuple
> like this ought to be fine -- the next pruning operation will get to
> it whenever (see the special handling for these aborted heap-only
> tuples at the top of heap_prune_chain()). But any disconnected
> heap-only tuple had better be from an aborted xact -- otherwise we
> have corruption. This is corruption that looks like index corruption,
> because sequential scans still give correct answers (I've seen this
> several times before, ask me how). But it's not index corruption,
> contrary to appearances -- it's heap corruption.
>
> Here are some specific checks I have in mind:
>
> * Right now, we throw an error when an LP_REDIRECT points to an
> LP_UNUSED item. But that existing test should go further, like the
> assertions I added recently. They should make sure that the
> LP_REDIRECT links directly to an LP_NORMAL item, which has tuple
> storage. And, the stored tuple must be a heap-only tuple.
>
> Note, in particular, that it's not okay if the LP_REDIRECT points to
> an LP_DEAD item (which amcheck won't complain about today). That is at
> least my position on it, and so far it looks like the right position
> (no assertion failures reported yet).
>
> * HOT chain validation: All visible/committed heap-only tuples should
> be reachable through a "recognizable root item for their HOT chain, or
> an earlier member of the HOT chain via its t_ctid link". The root item
> can either be an LP_REDIRECT, or a regular not-heap-only tuple -- but
> it can't be anything else.
>
> * All other heap-only tuples (all heap-only tuples that don't seem to
> be part of a valid HOT chain) must be from aborted transactions -- as
> I said, this has to be corruption (heap corruption disguised as index
> corruption, even).
>
> Note that we're very permissive about broken HOT chains -- in general
> a heap-only tuple's t_ctid link can have pretty wild values, and
> that's okay. For example, it can point past the current bounds of the
> page's line pointer array. Everything is still robust because of the
> way that we validate HOT chains during chain traversal -- we match
> tuple N's xmax to tuple N+1's xmin, all the time, inside code like
> heap_hot_search_buffer() that is routinely used during index scans.
> However, the root item of a HOT chain (usually an LP_REDIRECT)
> absolutely must point to a valid heap-only tuple at all times --
> that's very different indeed. That's the starting point for an index
> scan -- it is the TID that is stored in indexes that needs to be
> stable over time. And so pruning needs to leave behind coherent
> looking whole entire HOT chains, just as we ought to verify that
> pruning consistently gets it right.

Thanks, Peter, for this analysis. It's getting a bit late in the day, but I'll try this out tomorrow, I expect.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2021-11-06 03:45:19 Re: Schema variables - new implementation for Postgres 15
Previous Message Peter Geoghegan 2021-11-06 02:51:12 amcheck's verify_heapam(), and HOT chain verification