Re: pg_amcheck contrib application

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Amul Sul <sulamul(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_amcheck contrib application
Date: 2021-03-23 19:41:54
Message-ID: CAH2-Wzma5G9CTtMjbrXTwOym+U=aWg-R7=-htySuztgoJLvZXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 23, 2021 at 12:05 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Right, good point. But... does that really apply to
> 005_opclass_damage.pl? I feel like with the kind of physical damage
> we're doing in 003_check.pl, it makes a lot of sense to stop vacuum
> from crashing headlong into that table. But, 005 is doing "logical"
> damage rather than "physical" damage, and I don't see why autovacuum
> should misbehave in that kind of case. In fact, the fact that
> autovacuum can handle such cases is one of the selling points for the
> whole design of vacuum, as opposed to, for example, retail index
> lookups.

FWIW that is only 99.9% true (contrary to what README.HOT says). This
is the case because nbtree page deletion will in fact search the tree
to find a downlink to the target page, which must be removed at the
same time -- see the call to _bt_search() made within nbtpage.c.

This is much less of a problem than you'd think, though, even with an
opclass that gives wrong answers all the time. Because it's also true
that _bt_getstackbuf() is remarkably tolerant when it actually goes to
locate the downlink -- because that happens via a linear search that
matches on downlink block number (it doesn't use the opclass for that
part). This means that we'll accidentally fail to fail if the page is
*somewhere* to the right in the "true" key space. Which probably means
that it has a greater than 50% chance of not failing with a 100%
broken opclass. Which probably makes our odds better with more
plausible levels of misbehavior (e.g. collation changes).

That being said, I should make _bt_lock_subtree_parent() return false
and back out of page deletion without raising an error in the case
where we really cannot locate a valid downlink. We really ought to
soldier on when that happens, since we'll do that for a bunch of other
reasons already. I believe that the only reason we throw an error
today is for parity with the page split case (the main
_bt_getstackbuf() call). But this isn't the same situation at all --
this is VACUUM.

I will make this change to HEAD soon, barring objections.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-03-23 19:44:48 Re: pg_amcheck contrib application
Previous Message Tom Lane 2021-03-23 19:35:59 Re: pg_upgrade failing for 200+ million Large Objects