Re: A design for amcheck heapam verification

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A design for amcheck heapam verification
Date: 2017-08-29 19:58:17
Message-ID: CAH2-Wz=fUzCh6wmk8cjaqhsSTNBh2RLrzVwX9Pgn0575ZVp13g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 11, 2017 at 4:30 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I spent only a few hours writing a rough prototype, and came up with
> something that does an IndexBuildHeapScan() scan following the
> existing index verification steps. Its amcheck callback does an
> index_form_tuple() call, hashes the resulting IndexTuple (heap TID and
> all), and tests it for membership of a bloom filter generated as part
> of the main B-Tree verification phase. The IndexTuple memory is freed
> immediately following hashing.

I attach a cleaned-up version of this. It has extensive documentation.
My bloom filter implementation is broken out as a separate patch,
added as core infrastructure under "lib".

I do have some outstanding concerns about V1 of the patch series:

* I'm still uncertain about the question of using IndexBuildHeapScan()
during Hot Standby. It seems safe, since we're only using the
CONCURRENTLY/AccessShareLock path when this happens, but someone might
find that objectionable on general principle. For now, in this first
version, it remains possible to call IndexBuildHeapScan() during Hot
Standby, to allow the new verification to work there.

* The bloom filter has been experimentally verified, and is based on
source material which is directly cited. It would nevertheless be
useful to have the hashing stuff scrutinized, because it's possible
that I've overlooked some subtlety.

This is only the beginning for heapam verification. Comprehensive
coverage can be added later, within routines that specifically target
some table, not some index.

While this patch series only adds index-to-heap verification for
B-Tree indexes, I can imagine someone adopting the same technique to
verifying that other access methods are consistent with their heap
relation. For example, it would be easy to do this with hash indexes.
Any other index access method where the same high-level principle that
I rely on applies can do index-to-heap verification with just a few
tweaks. I'm referring to the high-level principle that comments
specifically point out in the patch: that REINDEX leaves you with an
index structure that has exactly the same entries as the old index
structure had, though possibly with fewer dead index tuples. I like my
idea of reusing IndexBuildHeapScan() for verification. Very few new
LOCs are actually added to amcheck by this patch, and
IndexBuildHeapScan() is itself tested.

--
Peter Geoghegan

Attachment Content-Type Size
0002-Add-amcheck-verification-of-indexes-against-heap.patch text/x-patch 28.9 KB
0001-Add-bloom-filter-data-structure-implementation.patch text/x-patch 12.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-08-29 20:14:33 Re: Improving overflow checks when adding tuple to PGresult Re: [GENERAL] Retrieving query results
Previous Message Tom Lane 2017-08-29 19:24:46 Re: Improving overflow checks when adding tuple to PGresult Re: [GENERAL] Retrieving query results