post-recovery amcheck expectations

From: Noah Misch <noah(at)leadboat(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: post-recovery amcheck expectations
Date: 2023-10-05 02:52:32
Message-ID: 20231005025232.c7.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Suppose we start with this nbtree (subset of a diagram from verify_nbtree.c):

* 1
* / \
* 2 <-> 3

We're deleting 2, the leftmost leaf under a leftmost internal page. After the
MARK_PAGE_HALFDEAD record, the first downlink from 1 will lead to 3, which
still has a btpo_prev pointing to 2. bt_index_parent_check() complains here:

/* The first page we visit at the level should be leftmost */
if (first && !BlockNumberIsValid(state->prevrightlink) && !P_LEFTMOST(opaque))
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("the first child of leftmost target page is not leftmost of its level in index \"%s\"",
RelationGetRelationName(state->rel)),
errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
state->targetblock, blkno,
LSN_FORMAT_ARGS(state->targetlsn))));

One can encounter this if recovery ends between a MARK_PAGE_HALFDEAD record
and its corresponding UNLINK_PAGE record. See the attached test case. The
index is actually fine in such a state, right? I lean toward fixing this by
having amcheck scan left; if left links reach only half-dead or deleted pages,
that's as good as the present child block being P_LEFTMOST. There's a
different error from bt_index_check(), and I've not yet studied how to fix
that:

ERROR: left link/right link pair in index "not_leftmost_pk" not in agreement
DETAIL: Block=0 left block=0 left link from block=4294967295.

Alternatively, one could view this as a need for the user to VACUUM between
recovery and amcheck. The documentation could direct users to "VACUUM
(DISABLE_PAGE_SKIPPING off, INDEX_CLEANUP on, TRUNCATE off)" if not done since
last recovery. Does anyone prefer that or some other alternative?

For some other amcheck expectations, the comments suggest reliance on the
bt_index_parent_check() ShareLock. I haven't tried to make test cases for
them, but perhaps recovery can trick them the same way. Examples:

errmsg("downlink or sibling link points to deleted block in index \"%s\"",
errmsg("block %u is not leftmost in index \"%s\"",
errmsg("block %u is not true root in index \"%s\"",

Thanks,
nm

Attachment Content-Type Size
amcheck-post-recovery-v0.patch text/plain 3.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-10-05 05:10:28 Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag
Previous Message Jon Erdman 2023-10-05 02:22:26 Good News Everyone! + feature proposal