Re: new heapcheck contrib module

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Amul Sul <sulamul(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new heapcheck contrib module
Date: 2020-08-04 16:44:23
Message-ID: CA+TgmoZsDFHS31Vd4o7f7hZv71ZvHr+kxQ5UYq6W9=qtVwR7sw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 4, 2020 at 12:00 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> With indexes you tend to have redundancy in how relationships among
> pages are described. So you have siblings whose pointers must be in
> agreement (left points to right, right points to left), and it's not
> clear which one you should trust when they don't agree. It's not like
> simple heuristics get you all that far. I really can't think of a good
> one, and detecting corruption should mean detecting truly exceptional
> cases. I guess you could build a model based on Bayesian methods, or
> something like that. But that is very complicated, and only used when
> you actually have corruption -- which is presumably extremely rare in
> reality. That's very unappealing as a project.

I think it might be possible to distinguish between different types of
corruption and to separate, at least to some degree, the checking
associated with each type. I think one can imagine something that
checks the structure of a btree without regard to the contents. That
is, it cares that left and right links are consistent with each other
and with downlinks from the parent level. So it checks things like the
left link of the page to which my right link points is pointing back
to me, and that's also the page to which my parent's next downlink
points. It could also verify that there's a proper tree structure,
where every page has a well-defined tree level. So you assign the root
page level 1, and each time you traverse a downlink you assign that
page a level one larger. If you ever try to assign to a page a level
unequal to the level previously assigned to it, you report that as a
problem. You can check, too, that if a page does not have a left or
right link, it's actually the last page at that level according what
you saw at the parent, grandparent, etc. levels. Finally, you can
check that all of the max-level pages you can find are leaf pages, and
the others are all internal pages. All of this structural stuff can be
verified without caring a whit about what keys you've got or what they
mean or whether there's even a heap associated with this index.

Now a second type of checking, which can also be done without regard
to keys, is checking that the TIDs in the index point to TIDs that are
on heap pages that actually exist, and that the corresponding items
are not unused, nor are they tuples which are not the root of a HOT
chain. Passing a check of this type doesn't prove that the index and
heap are consistent, but failing it proves that they are inconsistent.
This kind of check can be done on every leaf index page you can find
by any means even if it fails the structural checks described above.
Failure of these checks on one page does not preclude checking the
same invariants for other pages. Let's call this kind of thing "basic
index-heap sanity checking."

A third type of checking is to verify the relationship between the
index keys within and across the index pages: are the keys actually in
order within a page, and are they in order across pages? The first
part of this can be checked individually for each page pretty much no
matter what other problems we may have; we only have to abandon this
checking for a particular page if it's total garbage and we cannot
identify any index items on the page at all. The second part, though,
has the problem you mention. I think the solution is to skip the
second part of the check for any pages that failed related structural
checks. For example, if my right sibling thinks that I am not its left
sibling, or my right sibling and I agree that we are siblings but do
not agree on who our parent is, or if that parent does not agree that
we have the same sibling relationship that we think we have, then we
should report that problem and forget about issuing any complaints
about the relationship between my key space and that sibling's key
space. The internal consistency of each page with respect to key
ordering can still be verified, though, and it's possible that my key
space can be validly compared to the key space of my other sibling, if
the structural checks pass on that side.

A fourth type of checking is to verify the index key against the keys
in the heap tuples to which they point, but only for index tuples that
passed the basic index-heap sanity checking and where the tuples have
not been pruned. This can be sensibly done even if the structural
checks or index-ordering checks have failed.

I don't mean to suggest that one would implement all of these things
as separate phases; that would be crazy expensive, and what if things
changed by the time you visit the page? Rather, the checks likely
ought to be interleaved, just keeping track internally of which things
need to be skipped because prerequisite checks have already failed.

Aside from providing a way to usefully continue after errors, this
would also be useful in certain scenarios where you want to know what
kind of corruption you have. For example, suppose that I start getting
wrong answers from index lookups on a particular index. Upon
investigation, it turns out that my last glibc update changed my OS
collation definitions for the collation I'm using, and therefore it is
to be expected that some of my keys may appear to be out of order with
respect to the new definitions. Now what I really want to know before
running REINDEX is that this is the only problem I have. It would be
amazing if I could run the tool and have it give me a list of problems
so that I could confirm that I have only index-ordering problems, not
any other kind, and even more amazing if it could tell me the specific
keys that were affected so that I could understand exactly how the
sorting behavior changed. If I were to discover that my index also has
structural problems or inconsistencies with the heap, then I'd know
that it couldn't be right to blame it only the collation update;
something else has gone wrong.

I'm speaking here with fairly limited knowledge of the details of how
all this actually works and, again, I'm not trying to suggest that you
or anyone is obligated to do any work on this, or that it would be
easy to accomplish or worth the time it took. I'm just trying to
sketch out what I see as maybe being theoretically possible, and why I
think it would be useful if it did.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-08-04 16:50:29 Re: Can a background worker exist without shared memory access for EXEC_BACKEND cases?
Previous Message Andrey M. Borodin 2020-08-04 16:32:57 Re: Amcheck: do rightlink verification with lock coupling