From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
Cc: | Aleksander Alekseev <aleksander(at)timescale(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: strange perf regression with data checksums |
Date: | 2025-06-10 19:00:01 |
Message-ID: | CAH2-WzmN0ZUUXfxU0obefUD3Use+xgb1SR-j8tAZGRpECgDdfg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jun 9, 2025 at 10:47 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> As I said, I think that this is actually an old bug. After all,
> _bt_killitems is required to test so->currPos.lsn against the same
> page's current LSN if the page pin was dropped since _bt_readpage was
> first called -- regardless of any other details. I don't think that
> it'll consistently do that in this hard-to-test mark/restore path on
> any Postgres version, so there might be a risk of failing to detect an
> unsafe concurrent TID recycling hazard.
I have confirmed that this flavor of the problem has existed for a
long time. We'll need to backpatch the fix to all supported branches.
Attached v3 breaks this part out into its own commit. v3 has a proper
commit message now, which explains the implications of the bug on
earlier releases. I won't repeat that here, except to say that it's
just about possible that kill_prior_tuple LP_DEAD bit setting will
incorrectly set LP_DEAD bits due to this bug.
I've been thinking about also committing the second patch against
master/Postgres 18 only -- but I've since had cold feet about that. It
is really just refactoring to make the code more robust against these
sorts of issues by conditioning everything on so->dropPin. Right now,
on 18, we're only doing that in _bt_killitems. That was enough to
detect this bug, but it might not work out that way in the future, if
there's a new (or an old) bug of the same general nature somewhere
else.
Current plan: Commit 0001 on all supported branches in the next couple
of days. Unless I hear objections.
I will wait until Postgres 18 branches from master before committing
0002. It's important refactoring work, but on reflection I find it
hard to justify not just waiting.
--
Peter Geoghegan
Attachment | Content-Type | Size |
---|---|---|
v3-0002-Stop-conditioning-nbtree-actions-on-pin-being-hel.patch | application/octet-stream | 15.2 KB |
v3-0001-Make-_bt_killitems-drop-pins-it-acquired-itself.patch | application/octet-stream | 6.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Sami Imseih | 2025-06-10 19:05:16 | Re: add function for creating/attaching hash table in DSM registry |
Previous Message | Sami Imseih | 2025-06-10 18:32:55 | Re: add function for creating/attaching hash table in DSM registry |