Re: Backpatching nbtree VACUUM (page deletion) hardening

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Backpatching nbtree VACUUM (page deletion) hardening
Date: 2022-09-03 01:14:37
Message-ID: YxKqfcnaA+Y/Rqdp@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 02, 2022 at 02:13:15PM -0700, Peter Geoghegan wrote:
> Postgres 14 commit 5b861baa55 added hardening to nbtree page deletion.
> This had the effect of making nbtree VACUUM robust against misbehaving
> operator classes -- we just LOG the problem and move on, without
> throwing an error. In practice a "misbehaving operator class" is often
> a problem with collation versioning.

This has been a problem for years, and still for years to come with
libc updates. I am not much into this stuff, but does running VACUUM
in this case help with the state of the index that used a past,
now-invalid, collation (be it libc or ICU) to get a bit cleaned up?

> An error ends the whole VACUUM operation. If VACUUM cannot delete the
> page the first time, there is no reason to think that it'll be any
> different on the second or the tenth attempt. The eventual result
> (absent user/DBA intervention) is that no antiwraparound autovacuum
> will ever complete, leading to an outage when the system hits
> xidStopLimit. (Actually this scenario won't result in the system
> hitting xidStopLimit where the failsafe is available, but that's
> another thing that is only in 14, so that's not any help.)

When written like that, this surely sounds extremely bad and this
would need more complex chirurgy (or just running with a build that
includes this patch?).

> This seems low risk. The commit in question is very simple. It just
> downgrades an old 9.4-era ereport() from ERROR to LOG, and adds a
> "return false;" immediately after that. The function in question is
> fundamentally structured in a way that allows it to back out of page
> deletion because of problems that are far removed from where the
> caller starts from. When and why we back out of page deletion is
> already opaque to the caller, so it's very hard to imagine a new
> problem caused by backpatching. Besides all this, 14 has been out for
> a while now.

Yeah, I can take it that we would have seen reports if this was an
issue, and I don't recall seeing one on the community lists, at
least.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2022-09-03 01:47:50 Re: First draft of the PG 15 release notes
Previous Message Michael Paquier 2022-09-03 01:06:58 Re: make additional use of optimized linear search routines