Quick Links

Re: strange perf regression with data checksums

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc:	Aleksander Alekseev <aleksander(at)timescale(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: strange perf regression with data checksums
Date:	2025-06-09 14:47:35
Message-ID:	CAH2-WzniimU5FuOFejLfdk-DkTBtTddpR8GM+sLj4_Lo48n_Ag@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi Alexander,

On Mon, Jun 9, 2025 at 2:00 AM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
> Please look at the following script, which falsifies an assertion

> TRAP: failed Assert("!BTScanPosIsPinned(so->currPos)"), File: "nbtutils.c", Line: 3379, PID: 1621028

I can reproduce this. I think that it's an old bug.

The problem is related to mark/restore (used by merge joins), where
nbtree can sometimes independently hold a second pin on leaf pages
(albeit very briefly, inside _bt_steppage). The cause of the failure
of my recently-added assertion is that it assumes that so->dropPin
always implies !BTScanPosIsPinned(so->currPos) at the start of
_bt_killitems. I don't think that that assumption is unreasonable,
even in light of this assertion problem.

I think that the real problem is the way that _bt_killitems releases
resources. _bt_killitems doesn't try to leave so->currPos as it was at
the start of its call -- this is nothing new. The overall effect is
that during so->dropPin _bt_steppage calls, we *generally* don't call
IncrBufferRefCount in the "bump pin on current buffer for assignment
to mark buffer" path, but *will* call IncrBufferRefCount when we
happened to need to call _bt_killitems. Calling _bt_killitems
shouldn't have these side-effects.

I can make the assertion failure go away by teaching _bt_killitems to
leave so->currPos in the same state that it was in when _bt_killitems
was called. If there was no pin on the page when _bt_killitems was
called, then there should be no pin held when it returns. See the
attached patch.

As I said, I think that this is actually an old bug. After all,
_bt_killitems is required to test so->currPos.lsn against the same
page's current LSN if the page pin was dropped since _bt_readpage was
first called -- regardless of any other details. I don't think that
it'll consistently do that in this hard-to-test mark/restore path on
any Postgres version, so there might be a risk of failing to detect an
unsafe concurrent TID recycling hazard.

I've always thought that the whole idiom of testing
BTScanPosIsPinned() in a bunch of places was very brittle. I wonder if
it makes sense to totally replace those calls/tests with similar tests
of the new so->dropPin field.

--
Peter Geoghegan

Attachment	Content-Type	Size
v1-0001-Fix-issue-with-mark-restore-nbtree-pins.patch	application/octet-stream	2.6 KB

In response to

Re: strange perf regression with data checksums at 2025-06-09 06:00:00 from Alexander Lakhin

Responses

Re: strange perf regression with data checksums at 2025-06-09 19:04:43 from Peter Geoghegan
Re: strange perf regression with data checksums at 2025-06-10 19:00:01 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2025-06-09 14:49:27	Re: strange perf regression with data checksums
Previous Message	Robert Treat	2025-06-09 14:32:11	Re: doc pg_constraint.convalidated column description need update