Re: Orphan page in _bt_split

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Orphan page in _bt_split
Date: 2025-09-01 19:04:58
Message-ID: CAH2-WzkGwm2nU8k=3w96jDsFkfiLHo6A2J2mqfG6XvtpVfp+ag@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 1, 2025 at 1:35 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> Nice investigation and report, that I assume you have just guessed
> from a read of the code and that there could be plenty of errors that
> could happen in this code path. It indeed looks like some weak
> coding assumption introduced in this code path by 9b42e713761a from
> 2019, going down to v11.

Commit 9b42e713761a really has nothing to do with this. It fixed a
similar issue that slipped in to Postgres 11. At worst, commit
9b42e713761a neglected to fix this other problem in passing.

This hazard has existing since commit 8fa30f906b, from 2010. That's
the commit that introduced the general idea of making _bt_split zero
its rightpage in order to make it safe to throw an ERROR instead of just
PANICing.

> We could have a SQL regression test for this case, just put a
> INJECTION_POINT(), then force an ERROR callback to force an incorrect
> state. The test can be made cheap enough.
>
> > This code is not changed for quite long time so I wonder why nobody noticed
> > this error before?
>
> I am ready to believe that errors are just hard to reach in this path.

Why?

There's just no reason to think that we'd ever be able to tie back one
of these LOG messages from VACUUM to the problem within _bt_split.
There's too many other forms of corruption that might result in VACUUM
logging this same error (e.g., breaking changes to a glibc collation).

An important case where this weakness will make life worse for users
is a checksum failure against the existing right sibling page -- since
those are not once off, transient errors (unlike, say, OOMs). Once you
have an index page with a bad checksum, there's a decent chance that
the application will attempt to insert onto the page to the immediate
left of that bad page. That'll trigger a split, sooner or later. Which
in turn triggers the problem that Konstantin reported. It's not going
to make the corruption problem markedly worse, but it's still not
great: there's no telling how many times successive inserters will try
and inevitably fail to split the same page, creating a new junk page
each time.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alena Rybakina 2025-09-01 19:13:18 Re: Vacuum statistics
Previous Message Alexander Lakhin 2025-09-01 19:00:00 Re: Improving tracking/processing of buildfarm test failures