amcheck's verify_heapam(), and HOT chain verification

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: amcheck's verify_heapam(), and HOT chain verification
Date: 2021-11-06 02:51:12
Message-ID: CAH2-Wznphpd490o+ur_bi6-YmC47hRu3Qyxod72RLqC-Uvuhcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I recently pushed a commit which formalized what we expect of HOT
chains during pruning -- see assertions and comments added by commit
5cd7eb1f, "Add various assertions to heap pruning code". I notice that
verify_heapam() doesn't really do anything with HOT chains, and I'd
say that that's a gap. Fortunately it should not be terribly difficult
to fill in this gap.

The short version is this: I think that verify_heapam() should
validate whole HOT chains, not just individual heap-only tuples. This
can be accomplished by taking a similar approach to the approach taken
in heap_prune_chain() already, for pruning.

For a given heap page, pruning fills in its PruneState variable to
build a coherent picture of the state of all HOT chains on the entire
page, before modifying anything (barring hint bit setting). We only
reach heap_page_prune_execute() when we have the whole picture about
HOT chains. This high level approach is essential when pruning.
Consider LP_REDIRECT items. We might need to set an existing
LP_REDIRECT item to LP_DEAD, rendering an entire HOT chain dead.
Whether or not that's the appropriate course of action for a given
LP_REDIRECT item will depend on the state of the HOT chain as a whole.

The HOT chain could of course have several heap-only tuples, and only
one visible tuple means that the LP_REDIRECT root item must not be set
LP_DEAD -- it should be made to link to the remaining live heap-only
tuple instead (while the dead intermediate heap-only tuples become
LP_UNUSED, never LP_DEAD). This is made even more tricky by the fact
that we have to consider HOT updaters with aborted transactions during
pruning.

We even have to consider the case where a HOT updater aborts, but
another HOT updater tries again and commits. Now we have a heap-only
tuple that is "disconnected from its original HOT chain" (or will be
until the next prune). Having a stray/disconnected heap-only tuple
like this ought to be fine -- the next pruning operation will get to
it whenever (see the special handling for these aborted heap-only
tuples at the top of heap_prune_chain()). But any disconnected
heap-only tuple had better be from an aborted xact -- otherwise we
have corruption. This is corruption that looks like index corruption,
because sequential scans still give correct answers (I've seen this
several times before, ask me how). But it's not index corruption,
contrary to appearances -- it's heap corruption.

Here are some specific checks I have in mind:

* Right now, we throw an error when an LP_REDIRECT points to an
LP_UNUSED item. But that existing test should go further, like the
assertions I added recently. They should make sure that the
LP_REDIRECT links directly to an LP_NORMAL item, which has tuple
storage. And, the stored tuple must be a heap-only tuple.

Note, in particular, that it's not okay if the LP_REDIRECT points to
an LP_DEAD item (which amcheck won't complain about today). That is at
least my position on it, and so far it looks like the right position
(no assertion failures reported yet).

* HOT chain validation: All visible/committed heap-only tuples should
be reachable through a "recognizable root item for their HOT chain, or
an earlier member of the HOT chain via its t_ctid link". The root item
can either be an LP_REDIRECT, or a regular not-heap-only tuple -- but
it can't be anything else.

* All other heap-only tuples (all heap-only tuples that don't seem to
be part of a valid HOT chain) must be from aborted transactions -- as
I said, this has to be corruption (heap corruption disguised as index
corruption, even).

Note that we're very permissive about broken HOT chains -- in general
a heap-only tuple's t_ctid link can have pretty wild values, and
that's okay. For example, it can point past the current bounds of the
page's line pointer array. Everything is still robust because of the
way that we validate HOT chains during chain traversal -- we match
tuple N's xmax to tuple N+1's xmin, all the time, inside code like
heap_hot_search_buffer() that is routinely used during index scans.
However, the root item of a HOT chain (usually an LP_REDIRECT)
absolutely must point to a valid heap-only tuple at all times --
that's very different indeed. That's the starting point for an index
scan -- it is the TID that is stored in indexes that needs to be
stable over time. And so pruning needs to leave behind coherent
looking whole entire HOT chains, just as we ought to verify that
pruning consistently gets it right.

--
Peter Geoghegan

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2021-11-06 03:18:30 Re: amcheck's verify_heapam(), and HOT chain verification
Previous Message Justin Pryzby 2021-11-06 02:48:48 Re: Draft release notes for next week's releases