Re: Backpatching nbtree VACUUM (page deletion) hardening

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Backpatching nbtree VACUUM (page deletion) hardening
Date: 2022-09-03 01:51:27
Message-ID: CAH2-Wzk+vQTqfkEsMZ92Q-_hKJEDS1okDJqe4QNJYKfWrpz-gw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 2, 2022 at 6:14 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> This has been a problem for years, and still for years to come with
> libc updates. I am not much into this stuff, but does running VACUUM
> in this case help with the state of the index that used a past,
> now-invalid, collation (be it libc or ICU) to get a bit cleaned up?

Yes -- nbtree VACUUM generally can cope quite well, even when the
index is corrupt. It should mostly manage to do what is expected here,
even with a misbehaving opclass, because it relies as little as
possible on user-defined opclass code.

Even without the hardening in place, nbtree VACUUM will still do a
*surprisingly* good job of recovering when the opclass is broken in
some way: VACUUM just needs the insertion scankey operator class code
to initially determine roughly where to look for the to-be-deleted
page's downlink, one level up in the tree. Even when an operator class
is wildly broken (e.g. the comparator gives a result that it
determines at random), we still won't see problems in nbtree VACUUM
most of the time -- because even being roughly correct is good enough
in practice!

You have to be quite unlucky to hit this, even when the opclass is
wildly broken (which is probably much less common than "moderately
broken").

> When written like that, this surely sounds extremely bad and this
> would need more complex chirurgy (or just running with a build that
> includes this patch?).

The patch will fix the case in question, which I have seen internal
AWS reports about -- though the initial fix that went into 14 wasn't
driven by any complaint from any user. I just happened to notice that
we were throwing an ERROR in nbtree VACUUM for no good reason, which
is something that should be avoided on general principle.

In theory there could be other ways in which you'd run into the same
basic problem (in any index AM). The important point is that we're
better off not throwing any errors in the first place, but if we must
then they had better not be errors that will be repeated again and
again, without any chance of the problem going away naturally. (Not
that it never makes sense to just throw an error; there are meaningful
gradations of "totally unacceptable problem".)

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-09-03 02:06:54 Re: postgres_fdw hint messages
Previous Message Bruce Momjian 2022-09-03 01:47:50 Re: First draft of the PG 15 release notes