Re: pgsql: amcheck: Fix verify_heapam for tuples where xmin or xmax is 0.

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgsql: amcheck: Fix verify_heapam for tuples where xmin or xmax is 0.
Date: 2023-03-27 20:51:38
Message-ID: CAH2-Wzn+1cBF+KsJOd-H4WnaNWuBgckSqNPbMeg1Q8eS9zxrXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Mon, Mar 27, 2023 at 1:17 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Patch attached.

This is fine, as far as it goes. Obviously it fixes the immediate problem.

> > I don't think that it's particularly likely that having refined
> > aborted speculative insertion amcheck coverage will make a critical
> > difference to any user, at any time. But "amcheck as documentation of
> > the on-disk format" is reason enough to have it.
>
> Sure, if someone feels like writing the code. I have to admit that I
> have mixed feelings about this whole direction. In concept, I agree
> with you entirely: a fringe benefit of having checks that tell us
> whether or not a page is valid is that it helps to make clear what
> page states we think are valid.

I don't think that it's a fringe benefit; it's just not necessarily of
direct benefit to amcheck users.

Before the HOT chain validation patch went in, it was unclear whether
certain conceivable on-disk states should constitute corruption. In
particular, it wasn't clear to anybody whether or not it was okay for
an LP_REDIRECT to point to an LP_DEAD until recently (and probably
other things besides that). I don't think that we should assume that
the easy part is abstractly defining corruption, while the hard part
is writing the tool to check for the corruption. Sometimes it is, but
I think that it's often the other way around.

> In practice, however, the point you
> raise in your first sentence weighs awfully heavily with me. Spending
> a lot of energy on checks that are unlikely to catch practical
> problems feels like it may not be the best use of time.

That definitely could be true, but I don't think that it's terribly
much extra effort in most cases.

> I'm not sure
> exactly where to draw the line, but it seems highly likely to be that
> there are things we could deduce about the page that wouldn't be worth
> the effort. For example, would we bother checking that a tuple with an
> in-progress xmin does not have a smaller natts value than a tuple with
> a committed xmin? Or that natts values are non-decreasing across a HOT
> chain? I suspect there are even more obscure examples of things that
> should be true but might not really be worth worrying about in the
> code.

A related way of looking at it (that I also find appealing) is that
it's often easier (far easier) to just have the check, and be done
with it. Of course there is bound to be uncertainty about how useful
any given check might be; we're looking for something that is
theoretically never supposed to happen. Why not just assume that it
might matter if it's not costing very much to check for it?

This is quite a different mentality than the one we bring to core
heapam code, where it's quite natural to just avoid strange corner
cases in the on-disk format like the plague. The risk profile is
totally different for amcheck code. Within amcheck, I'd rather go too
far than not go far enough.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Michael Paquier 2023-03-27 22:36:53 pgsql: Generate a few more functions of pgstatfuncs.c with macros
Previous Message Robert Haas 2023-03-27 20:17:32 Re: pgsql: amcheck: Fix verify_heapam for tuples where xmin or xmax is 0.

Browse pgsql-hackers by date

  From Date Subject
Next Message Jehan-Guillaume de Rorthais 2023-03-27 21:13:23 Re: Memory leak from ExecutorState context?
Previous Message Robert Haas 2023-03-27 20:47:22 Re: running logical replication as the subscription owner