VACUUM can finish an interrupted nbtree page split -- is that okay?

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: VACUUM can finish an interrupted nbtree page split -- is that okay?
Date: 2019-03-01 23:59:29
Message-ID: CAH2-Wz=_Xvv8byzK_LvY4ci76OgsHCQzoKF7We8yG9waO7j6rA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

_bt_lock_branch_parent() is used by VACUUM during page deletion, and
calls _bt_getstackbuf(), which always finishes incomplete page splits
for the parent page that it exclusive locks and returns. ISTM that
this may be problematic, since it contradicts the general rule that
VACUUM isn't supposed to finish incomplete page splits. According to
the nbtree README:

"It would seem natural to add the missing downlinks in VACUUM, but since
inserting a downlink might require splitting a page, it might fail if you
run out of disk space. That would be bad during VACUUM - the reason for
running VACUUM in the first place might be that you run out of disk space,
and now VACUUM won't finish because you're out of disk space. In contrast,
an insertion can require enlarging the physical file anyway."

I'm inclined to note this as an exception in the nbtree README, and
leave it at that. Interrupted internal page splits are probably very
rare in practice, so the operational risk of running out of disk space
like this is minimal.

FWIW, I notice that the logic that appears after the
_bt_lock_branch_parent() call to _bt_getstackbuf() anticipates that it
must defend against interrupted splits in at least the
grandparent-of-leaf page, and maybe even the parent, so it's probably
not unworkable to not finish the split:

/*
* If the target is the rightmost child of its parent, then we can't
* delete, unless it's also the only child.
*/
if (poffset >= maxoff)
{
/* It's rightmost child... */
if (poffset == P_FIRSTDATAKEY(opaque))
{
/*
* It's only child, so safe if parent would itself be removable.
* We have to check the parent itself, and then recurse to test
* the conditions at the parent's parent.
*/
if (P_RIGHTMOST(opaque) || P_ISROOT(opaque) ||
P_INCOMPLETE_SPLIT(opaque))
{
_bt_relbuf(rel, pbuf);
return false;
}

Separately, I noticed another minor issue that appears a few lines
further down still:

/*
* Perform the same check on this internal level that
* _bt_mark_page_halfdead performed on the leaf level.
*/
if (_bt_is_page_halfdead(rel, *rightsib))
{
elog(DEBUG1, "could not delete page %u because its
right sibling %u is half-dead",
parent, *rightsib);
return false;
}

I thought that internal pages were never half-dead after Postgres 9.4.
If that happens, then the check within _bt_pagedel() will throw an
ERRCODE_INDEX_CORRUPTED error, and tell the DBA to REINDEX. Shouldn't
this internal level _bt_is_page_halfdead() check contain a "can't
happen" error, or even a simple assertion?

--
Peter Geoghegan

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2019-03-02 00:11:46 Re: NOT IN subquery optimization
Previous Message Tom Lane 2019-03-01 23:49:57 Re: [HACKERS] Incomplete startup packet errors