Re: Amcheck verification of GiST and GIN

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andrey Borodin <amborodin86(at)gmail(dot)com>
Cc: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Jose Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Amcheck verification of GiST and GIN
Date: 2023-03-17 01:22:42
Message-ID: CAH2-WzkUHW7HfOwkwbQ56UzjGC970=1b+vTwRJUp74zP_RPQZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 16, 2023 at 4:48 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Some feedback on the GiST patch:

I see that the Bloom filter that's used to implement heapallindexed
verification fingerprints index tuples that are formed via calls to
gistFormTuple(), without any attempt to normalize-away differences in
TOAST input state. In other words, there is nothing like
verify_nbtree.c's bt_normalize_tuple() function involved in the
fingerprinting process. Why is that safe, though? See the "toast_bug"
test case within contrib/amcheck/sql/check_btree.sql for an example of
how inconsistent TOAST input state confused verify_nbtree.c's
heapallindexed verification (before bugfix commit eba775345d). I'm
concerned about GiST heapallindexed verification being buggy in
exactly the same way, or in some way that is roughly analogous.

I do have some concerns about there being analogous problems that are
unique to GiST, since GiST is an AM that gives opclass authors many
more choices than B-Tree opclass authors have. In particular, I wonder
if heapallindexed verification needs to account for how GiST
compression might end up breaking heapallindexed. I refer to the
"compression" implemented by GiST support routine 3 of GiST opclasses.
The existence of GiST support routine 7, the "same" routine, also
makes me feel a bit squeamish about heapallindexed verification -- the
existence of a "same" routine hints at some confusion about "equality
versus equivalence" issues.

In more general terms: heapallindexed verification works by
fingerprinting index tuples during the index verification stage, and
then performing Bloom filter probes in a separate CREATE INDEX style
heap-matches-index stage (obviously). There must be some justification
for our assumption that there can be no false positive corruption
reports due only to a GiST opclass (either extant or theoretical) that
follows the GiST contract, and yet allows an inconsistency to arise
that isn't really index corruption. This justification won't be easy
to come up with, since the GiST contract was not really designed with
these requirements in mind. But...we should try to come up with
something.

What are the assumptions underlying heapallindexed verification for
GiST? It doesn't have to be provably correct or anything, but it
should at least be empirically falsifiable. Basically, something that
says: "Here are our assumptions, if we were wrong in making these
assumptions then you could tell that we made a mistake because of X,
Y, Z". It's not always clear when something is corrupt. Admittedly I
have much less experience with GiST than other people, which likely
includes you (Andrey). I am likely missing some context around the
evolution of GiST. Possibly I'm making a big deal out of something
without it being unhelpful. Unsure.

Here is an example of the basic definition of correctness being
unclear, in a bad way: Is a HOT chain corrupt when its root
LP_REDIRECT points to an LP_DEAD item, or does that not count as
corruption? I'm pretty sure that the answer is ambiguous even today,
or was ambiguous until recently, at least. Hopefully the
verify_heapam.c HOT chain verification patch will be committed,
providing us with a clear *definition* of HOT chain corruption -- the
definition itself may not be the easy part.

On a totally unrelated note: I wonder if we should be checking that
internal page tuples have 0xffff as their offset number? Seems like
it'd be a cheap enough cross-check.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2023-03-17 01:32:26 Re: "current directory" in a server error message
Previous Message Andrew Dunstan 2023-03-17 01:14:35 Re: SQL/JSON revisited