Re: amcheck verification for GiST

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: amcheck verification for GiST
Date: 2019-03-27 23:57:17
Message-ID: CAH2-WzkswQoYNTgAN0A+WV+=KmEPtuNUv-drEZ+_Q__zjQX4KA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 27, 2019 at 10:29 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> Thanks! I had a look, and refactored it quite a bit.

I'm really happy that other people seem to be driving amcheck in a new
direction, without any prompting from me. It's too important to remain
something that only I have contributed to.

> I found the way the recursion worked confusing. On each internal page,
> it looped through all the child nodes, checking the consistency of the
> downlinks. And then it looped through the children again, to recurse.
> This isn't performance-critical, but visiting every page twice still
> seems strange.

To be fair, that's actually what bt_index_parent_check() does. OTOH,
it has to work in a way that makes it an extension of
bt_index_check(), which is not a problem for
gist_index_parent_check().

> In gist_check_page_keys(), if we get into the code to deal with a
> concurrent update, we set 'parent' to point to a tuple on a parent page,
> then unlock it, and continue to look at remaining tuples, using the
> pointer that points to an unlocked buffer.

Why not just copy the page into a local buffer? See my remarks on this below.

> I came up with the attached, which fixes the above-mentioned things. I
> also replaced the check that each node has only internal or leaf
> children, with a different check that the tree has the same height in
> all branches. That catches more potential problems, and was easier to
> implement after the refactoring. This still needs at least a round of
> fixing typos and tidying up comments, but it's more straightforward now,
> IMHO.

You probably didn't mean to leave this "boom" error behind:

> + if (GistPageIsDeleted(page))
> + {
> + elog(ERROR,"boom");

I see that you have this check for deleted pages:

> + if (PageGetMaxOffsetNumber(page) > InvalidOffsetNumber)
> + ereport(ERROR,
> + (errcode(ERRCODE_INDEX_CORRUPTED),
> + errmsg("index \"%s\" has deleted page with tuples",
> + RelationGetRelationName(rel))));
> + }

Why not have a similar check for non-deleted pages, whose maxoffset
must be <= MaxIndexTuplesPerPage?

I see various errors like this one:

> + if (MAXALIGN(ItemIdGetLength(iid)) != MAXALIGN(IndexTupleSize(idxtuple)))
> + ereport(ERROR,
> + (errcode(ERRCODE_INDEX_CORRUPTED),
> + errmsg("index \"%s\" has inconsistent tuple sizes",
> + RelationGetRelationName(rel))));

Can't we say which TID is involved here, so we can find the offending
corrupt tuple afterwards? Or at least the block number? And maybe even
the LSN of the page? I think that that kind of stuff could be added in
a number of places.

I see this stuff that's related to concurrent processes:

> + /*
> + * It's possible that the page was split since we looked at the parent,
> + * so that we didn't missed the downlink of the right sibling when we
> + * scanned the parent. If so, add the right sibling to the stack now.
> + */

> + /*
> + * There was a discrepancy between parent and child tuples.
> + * We need to verify it is not a result of concurrent call
> + * of gistplacetopage(). So, lock parent and try to find downlink
> + * for current page. It may be missing due to concurrent page
> + * split, this is OK.
> + */

Is this really needed? Isn't the ShareLock on the index sufficient? If so, why?

> + stack->parenttup = gist_refind_parent(rel, stack->parentblk, stack->blkno, strategy);

If the gistplacetopage() stuff is truly necessary, then is it okay to
call gist_refind_parent() with the original buffer lock still held
like this?

I still suspect that we should have something like palloc_btree_page()
for GiST, so that we're always operating on a copy of the page in
palloc()'d memory. Maybe it's worthwhile to do something clever with
concurrently holding buffer locks, but if that's what we're doing here
then I would also expect us to have something weaker than ShareLock as
our relation-level heavyweight lock. And, there should be a prominent
explanation of the theory behind it somewhere.

> What have you been using to test this?

pg_hexedit has full support for GiST. ;-)

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nagaura, Ryohei 2019-03-28 01:11:23 RE: Timeout parameters
Previous Message David Rowley 2019-03-27 23:49:02 Re: Should the docs have a warning about pg_stat_reset()?