Re: Amcheck verification of GiST and GIN

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andrey Borodin <amborodin86(at)gmail(dot)com>
Cc: Jose Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Amcheck verification of GiST and GIN
Date: 2023-02-02 19:51:45
Message-ID: CAH2-Wz=_Znd0ABX_E6E54iiu67LHKa9ZP3CTgex8FMQG_oCDvw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 13, 2023 at 8:15 PM Andrey Borodin <amborodin86(at)gmail(dot)com> wrote:
> (v21 of patch series)

I can see why the refactoring patch is necessary overall, but I have
some concerns about the details. More specifically:

* PageGetItemIdCareful() doesn't seem like it needs to be moved to
amcheck.c and generalized to work with GIN and GiST.

It seems better to just allow some redundancy, by having static/local
versions of PageGetItemIdCareful() for both GIN and GiST. There are
numerous reasons why that seems better to me. For one thing it's
simpler. For another, the requirements are already a bit different,
and may become more different in the future. I have seriously
considered adding a new PageGetItemCareful() routine to nbtree in the
past (which would work along similar lines when we access
IndexTuples), which would have to be quite different across each index
AM. Maybe this idea of adding a PageGetItemCareful() would totally
supersede the existing PageGetItemIdCareful() function.

But even now, without any of that, the rules for
PageGetItemIdCareful() are already different. For example, with GIN
you cannot have LP_DEAD bits set, so ISTM that you should be checking
for that in its own custom version of PageGetItemIdCareful().

You can just have comments that refer the reader to the original
nbtree version of PageGetItemIdCareful() for a high level overview.

* You have distinct versions of the current btree_index_checkable()
function for both GIN and GiST, which doesn't seem necessary to me --
so this is kind of the opposite of the situation with
PageGetItemIdCareful() IMV.

The only reason to have separate versions of these is to detect when
the wrong index AM is used -- the other 2 checks are 100% common to
all index AMs. Why not just move that one non-generic check out of the
function, to each respective index AM .c file, while keeping the other
2 generic checks in amcheck.c?

Once things are structured this way, it would then make sense to add a
can't-be-LP_DEAD check to the GIN specific version of
PageGetItemIdCareful().

I also have some questions about the verification functionality itself:

* Why haven't you done something like palloc_btree_page() for both
GiST and GIN, and use that for everything?

Obviously this may not be possible in100% of all cases -- even
verify_nbtree.c doesn't manage that. But I see no reason for that
here. Though, in general, it's not exactly clear what's going on with
buffer lock coupling in general.

* Why does gin_refind_parent() buffer lock the parent while the child
buffer lock remains held?

In any case this doesn't really need to have any buffer lock coupling.
Since you're both of the new verification functions you're adding are
"parent" variants, that acquire a ShareLock to block concurrent
modifications and concurrent VACUUM?

* Oh wait, they don't use a ShareLock at all -- they use an
AccessShareLock. This means that there are significant inconsistencies
with the verify_nbtree.c scheme.

I now realize that gist_index_parent_check() and
gin_index_parent_check() are actually much closer to bt_index_check()
than to bt_index_parent_check(). I think that you should stick with
the convention of using the word "parent" whenever we'll need a
ShareLock, and omitting "parent" whenever we will only require an
AccessShareLock. I'm not sure if that means that you should change the
lock strength or change the name of the functions. I am sure that you
should follow the general convention that we have already.

I feel rather pessimistic about our ability to get all the details
right with GIN. Frankly I have serious doubts that GIN itself gets
everything right, which makes our task just about impossible. The GIN
README did gain a "Concurrency" section in 2019, at my behest, but in
general the locking protocols are still chronically under-documented,
and have been revised in various ways as a response to bugs. So at
least in the case of GIN, we really need amcheck coverage, but should
take a very conservative approach.

With GIN I think that we need to make the most modest possible
assumptions about concurrency, by using a ShareLock. Without that, I
think that we can have very little confidence in the verification
checks -- the concurrency rules are just too complicated right now.
Maybe it will be possible in the future, but right now I'd rather not
try that. I find it very difficult to figure out the GIN locking
protocol, even for things that seem like they should be quite
straightforward. This situation would be totally unthinkable in
nbtree, and perhaps with GiST.

* Why does the GIN patch change a comment in contrib/amcheck/amcheck.c?

* There is no pg_amcheck patch here, but I think that there should be,
since that is now the preferred and recommended way to run amcheck in
general.

We could probably do something very similar to what is already there
for nbtree. Maybe it would make sense to change --heapallindexed and
--parent-check so that they call your parent check functions for GiST
and GIN -- though the locking/naming situation must be resolved before
we decide what to do here, for pg_amcheck.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-02-02 20:09:57 Re: Weird failure with latches in curculio on v15
Previous Message Nathan Bossart 2023-02-02 19:37:00 Re: recovery modules