Re: Amcheck verification of GiST and GIN

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andrey Borodin <amborodin86(at)gmail(dot)com>
Cc: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Jose Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Amcheck verification of GiST and GIN
Date: 2023-03-27 02:34:49
Message-ID: CAH2-WzndEBSGDWSARP2=CUeFU5WWp+qtHtfK-bchMraWO1YJ9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 19, 2023 at 4:00 PM Andrey Borodin <amborodin86(at)gmail(dot)com> wrote:
> After several attempts to corrupt GiST with this 0.000001 epsilon
> adjustment tolerance I think GiST indexing of points is valid.
> Because intersection for search purposes is determined with the same epsilon!
> So it's kind of odd
> postgres=# select point(0.0000001,0)~=point(0,0);
> ?column?
> ----------
> t
> (1 row)
> , yet the index works correctly.

I think that it's okay, provided that we can assume deterministic
behavior in the code that forms new index tuples. Within nbtree,
operator classes like numeric_ops are supported by heapallindexed
verification, without any requirement for special normalization code
to make it work correctly as a special case. This is true even though
operator classes such as numeric_ops have similar "equality is not
equivalence" issues, which comes up in other areas (e.g., nbtree
deduplication, which must call support routine 4 during a CREATE INDEX
[1]).

The important principle is that amcheck must always be able to produce
a consistent fingerprintable binary output given the same input (the
same heap tuple/Datum array). This must work across all operator
classes that play by the rules for GiST operator classes. We *can*
tolerate some variation here. Well, we really *have* to tolerate a
little of this kind of variation in order to deal with the TOAST input
state thing...but I hope that that's the only complicating factor
here, for GiST (as it is for nbtree). Note that we already rely on the
fact that index_form_tuple() uses palloc0() (not plain palloc) in
verify_nbtree.c, for the obvious reason.

I think that there is a decent chance that it just wouldn't make sense
for an operator class author to ever do something that we need to
worry about. I'm pretty sure that it's just the TOAST thing. But it's
worth thinking about carefully.

[1] https://www.postgresql.org/docs/devel/btree-support-funcs.html
--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2023-03-27 02:47:01 Re: Initial Schema Sync for Logical Replication
Previous Message Kirk Wolak 2023-03-27 02:24:42 Re: Documentation Not Compiling (http://docbook... not https:.//...)