Possible duplicate release of buffer lock.

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Possible duplicate release of buffer lock.
Date: 2016-08-03 08:31:16
Message-ID: 20160803.173116.111915228.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

I had an inquiry about the following log messages.

2016-07-20 10:16:58.294 JST,,,3240,,578ed102.ca8,1,,2016-07-20 10:16:50 JST,30/75,0,LOG,00000,"no left sibling (concurrent deletion?) in ""some_index_rel""",,,,,,,,"_bt_unlink_halfdead_page, nbtpage.c:1643",""
2016-07-20 10:16:58.294 JST,,,3240,,578ed102.ca8,2,,2016-07-20 10:16:50 JST,30/75,0,ERROR,XX000,"lock main 13879 is not held",,,,,"automatic vacuum of table ""db.nsp.tbl""",,,"LWLockRelease, lwlock.c:1137",""

These are gotten after pg_upgrade from 9.1.13 to 9.4.

The first line is emitted for simultaneous deletion of a index
page, which is impossible by design in a consistent state so the
complained situation should be the result of an index corruption
before upgading, specifically, inconsistent sibling pointers
around a deleted page.

I noticed the following part in nbtpage.c related to this. It is
the same still in the master.

nbtpage.c:1635(at)9(dot)4(dot)8:

> while (P_ISDELETED(opaque) || opaque->btpo_next != target)
> {
> /* step right one page */
> leftsib = opaque->btpo_next;
> _bt_relbuf(rel, lbuf);
> if (leftsib == P_NONE)
> {
> elog(LOG, "no left sibling (concurrent deletion?) in \"%s\"",
> RelationGetRelationName(rel));
> return false;

With the condition for the while loop, if the just left sibling
of target is (mistakenly, of course) in deleted state (and the
target is somehow pointing to the deleted page as left sibling),
lbuf finally goes beyond to right side of the target. This seems
to result in unintentional releasing of the lock on target and
the second log message.

My point here is that if concurrent deletion can't be perfomed by
the current implement, this while loop could be removed and
immediately error out or log a message,

> if (P_ISDELETED(opaque) || opaque->btpo_next != target)
> {
> elog(ERROR, "no left sibling of page %d (concurrent deletion?) in \"%s\"",..

or, the while loop at least should stop before overshooting the
target.

> while (P_ISDELETED(opaque) || opaque->btpo_next != target)
> {
> /* step right one page */
> leftsib = opaque->btpo_next;
> _bt_relbuf(rel, lbuf);
> if (leftsib == target || leftsib == P_NONE)
> {
> elog(ERROR, "no left sibling of page %d (concurrent deletion?) in \"%s\"",..

I'd like to propose to do the former since the latter still is
not perfect for such situations, anyway.

Any thoughts or opinions?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Sharma 2016-08-03 09:52:18 OldSnapshotTimemapLock information is missing in monitoring.sgml file
Previous Message Tal Walter 2016-08-03 07:52:47 Re: Wanting to learn about pgsql design decision