Re: Amcheck verification of GiST and GIN

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andrey Borodin <amborodin86(at)gmail(dot)com>
Cc: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Jose Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Amcheck verification of GiST and GIN
Date: 2023-03-16 23:48:09
Message-ID: CAH2-WzmUuuyZJkAnDUnP_TmfQMsa1JeO5yimcAeb5qczGB-0kg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 5, 2023 at 4:45 PM Andrey Borodin <amborodin86(at)gmail(dot)com> wrote:
> Here's v24 == (v23 + a step for pg_amcheck). There's a lot of
> shotgun-style changes, but I hope next index types will be easy to add
> now.

Some feedback on the GiST patch:

* You forgot to initialize GistCheckState.heaptuplespresent to 0.

It might be better to allocate GistCheckState dynamically, using
palloc0(). That's future proof. "Simple and obvious" is usually the
most important goal for managing memory in amcheck code. It can be a
little inefficient if that makes it simpler.

* ISTM that gist_index_check() should allow the caller to omit a
"heapallindexed" argument by specifying "DEFAULT FALSE", for
consistency with bt_index_check().

(Actually there are two versions of bt_index_check(), with
overloading, but that's just because of the way that the extension
evolved over time).

* What's the point in having a custom memory context that is never reset?

I believe that gistgetadjusted() will leak memory here, so there is a
need for some kind of high level strategy for managing memory. The
strategy within verify_nbtree.c is to call MemoryContextReset() right
after every loop iteration within bt_check_level_from_leftmost() --
which is pretty much once every call to bt_target_page_check(). That
kind of approach is obviously not going to suffer any memory leaks.

Again, "simple and obvious" is good for memory management in amcheck.

* ISTM that it would be clearer if the per-page code within
gist_check_parent_keys_consistency() was broken out into its own
function -- a little like bt_target_page_check()..

That way the control flow would be easier to understand when looking
at the code at a high level.

* ISTM that gist_refind_parent() should throw an error about
corruption in the event of a parent page somehow becoming a leaf page.

Obviously this is never supposed to happen, and likely never will
happen, even with corruption. But it seems like a good idea to make
the most conservative possible assumption by throwing an error. If it
never happens anyway, then the fact that we handle it with an error
won't matter -- so the error is harmless. If it does happen then we'll
want to hear about it as soon as possible -- so the error is useful.

* I suggest using c99 style variable declarations in loops.

Especially for things like "for (OffsetNumber offset =
FirstOffsetNumber; ... ; ... )".

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-03-17 00:00:10 Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
Previous Message Nathan Bossart 2023-03-16 23:47:01 Re: improving user.c error messages