From: | Arseniy Mukhin <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(at)vondra(dot)me> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Andrey Borodin <amborodin86(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Jose Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Amcheck verification of GiST and GIN |
Date: | 2025-05-09 12:43:46 |
Message-ID: | CAE7r3MJ611B9TE=YqBBncewp7-k64VWs+sjk7XF6fJUX77uFBA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
Thanks everybody for the patch.
I noticed there are no tests that GIN check fails if the index is
corrupted, so I thought it would be great to have some.
While writing tests I noticed some issues in the patch (all issues are
for verify_gin.c)
1) When we add new items to the entry tree stack, ptr->parenttup is always null
because GinPageGetOpaque(page)->rightlink is never NULL.
/* last tuple in layer has no high key */
if (i != maxoff && !GinPageGetOpaque(page)->rightlink)
ptr->parenttup = CopyIndexTuple(idxtuple);
else
ptr->parenttup = NULL;
Right way to check if entry doesn't have right neighbour is
GinPageGetOpaque(page)->rightlink == InvalidBlockNumber
But even if we fix it, the condition would not do what the comment
says. If we want to have NULL as parenttup
only for the last tuple of the btree layer, the right check would be:
if (i == maxoff && rightlink == InvalidBlockNumber)
ptr->parenttup = NULL;
else
ptr->parenttup = CopyIndexTuple(idxtuple);
2) Check don't use attnum in comparisons, but for multicolumn indexes
attnum defines order. When we compare max entry page key
with parent key we ignore attnum. It means we occasionally can try to
compare keys of different columns.
While checking order within the same page we skip checking order for
tuples with different attnums now,
but it can be checked. Fix is easy: using ginCompareAttEntries()
instead of ginCompareEntries().
3) Here a code of the split detection
if (rightlink != InvalidBlockNumber &&
ginCompareEntries(&state, attnum, page_max_key,
page_max_key_category, parent_key,
parent_key_category) > 0)
{
/* split page detected, install right link to the stack */
Condition seems not right, because the child page max item key never
can be bigger then parent key.
It can be equal to the parentkey, and it means that there was no split
and the parent key that we cached in the stack is still
relevant. Or it could be less then cached parent key and it means that
split took place and old max item key moved to the
right neighbour and current page max item key should be less then
cached parent key. So I think we should replace > with <.
4) Here is the code for checking the order within the entry page.
/*
* First block is metadata, skip order check. Also, never check
* for high key on rightmost page, as this key is not really
* stored explicitly.
*
* Also make sure to not compare entries for different attnums,
* which may be stored on the same page.
*/
if (i != FirstOffsetNumber && attnum == prev_attnum &&
stack->blkno != GIN_ROOT_BLKNO &&
!(i == maxoff && rightlink == InvalidBlockNumber))
{
prev_key = gintuple_get_key(&state, prev_tuple,
&prev_key_category);
if (ginCompareEntries(&state, attnum, prev_key,
prev_key_category, current_key,
current_key_category) >= 0)
We skip checking the order for the root page, it's not clear why.
Probably there is some mess with the meta page, because
comment says "First block is metadata, skip order check". So I think
we can remove
stack->blkno != GIN_ROOT_BLKNO
5) The same place as 4). We skip checking the order for the high key
on the rightmost page, as this key is not really stored explicitly,
but for leaf pages all keys are stored explicitly, so we can check the
order for the last item of the leaf page too.
So I think we can change the condition to this:
!(i == maxoff && rightlink == InvalidBlockNumber &&
!GinPageIsLeaf(page))
6) In posting tree parent key check part:
/*
* Check if this tuple is consistent with the downlink in the
* parent.
*/
if (stack->parentblk != InvalidBlockNumber && i == maxoff &&
ItemPointerCompare(&stack->parentkey, &posting_item->key) < 0)
ereport(ERROR,
...
Here we don't check if stack->parentkey is valid, so sometimes we
compare invalid parentkey (because we can have
valid parentblk and invalid parentkey the same time). Invalid
parentkey is always bigger, so the code never triggers
ereport, but it doesn't look right. so probably we can rewrite it this way:
if (i == maxoff && ItemPointerIsValid(&stack->parentkey) &&
ItemPointerCompare(&stack->parentkey, &posting_item->key) < 0)
7) When producing stack entries for posting tree check, we set parent
key like this:
/*
* Set rightmost parent key to invalid item pointer. Its value
* is 'Infinity' and not explicitly stored.
*/
if (rightlink == InvalidBlockNumber)
ItemPointerSetInvalid(&ptr->parentkey);
else
ptr->parentkey = posting_item->key;
We set invalid parent key for all items of the rightmost page. But
it's the only rightmost item that doesn't have an explicit
parentkey (actually the comment says exactly this, but the code does a
different thing). All others have an explicit parent
key and we can set it. So fix can look like this:
if (rightlink == InvalidBlockNumber && i == maxoff)
ItemPointerSetInvalid(&ptr->parentkey);
else
ptr->parentkey = posting_item->key;
But for (rightlink == InvalidBlockNumber && i == maxoff)
posting_item->key is always (0,0) (we check it a little bit earlier),
so I think we can simplify it:
ptr->parentkey = posting_item->key;
8) In the function gin_refind_parent() the code
if (ItemPointerGetBlockNumber(&(itup->t_tid)) == childblkno)
triggers Assert(ItemPointerIsValid(pointer)) within the
ItemPointerGetBlockNumber(), because itup->t_tid could be invalid.
AFAIS GIN code uses a special method GinGetDownlink(itup) that avoids
this Assert. So we can use it here too.
Please find the attached patch with fixes for issues above. Also there
are added regression test for multicolumn index,
and several tap tests with some basic corrupted index cases. I'm not
sure if it's the right way to write such tests and would
be glad to hear any feedback, especially about
invalid_entry_columns_order_test() where it seems important to
preserve
byte ordering. Also all tests expect standard page size 8192 now.
Also there are several points that I think also worth addressing:
9) Field 'parentlsn' is set but never actually used for any check. Or
I missed something.
10) README says "Vacuum never deletes tuples or pages from the entry
tree." But check assumes that it's possible to have
deleted leaf page with 0 entries.
if (GinPageIsDeleted(page))
{
if (!GinPageIsLeaf(page))
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\" has deleted internal page %u",
RelationGetRelationName(rel), blockNo)));
if (PageGetMaxOffsetNumber(page) > InvalidOffsetNumber)
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\" has deleted page %u with tuples",
RelationGetRelationName(rel), blockNo)));
}
11) When we compare entry tree max page key with parent key:
if (ginCompareAttEntries(&state, attnum, current_key,
current_key_category, parent_key_attnum,
parent_key, parent_key_category) > 0)
{
/*
* There was a discrepancy between parent and child
* tuples. We need to verify it is not a result of
* concurrent call of gistplacetopage(). So, lock parent
* and try to find downlink for current page. It may be
* missing due to concurrent page split, this is OK.
*/
pfree(stack->parenttup);
stack->parenttup = gin_refind_parent(rel, stack->parentblk,
stack->blkno, strategy);
I think we can remove gin_refind_parent() and do ereport right away here.
The same logic as with 3). AFAIK it's impossible to have a child item
with a key that is higher than the cached parent key.
Parent key bounds what keys we can insert into the child page, so it
seems there is no way how they can appear there.
Best regards,
Arseniy Mukhin
Attachment | Content-Type | Size |
---|---|---|
0001-verify-gin-fixes-and-tests.patch | text/x-patch | 17.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Aleksander Alekseev | 2025-05-09 12:53:52 | Re: strange perf regression with data checksums |
Previous Message | Aleksander Alekseev | 2025-05-09 12:43:18 | Re: [PATCH] avoid double scanning in function byteain |