Re: pg_amcheck contrib application

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Noah Misch <noah(at)leadboat(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Amul Sul <sulamul(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_amcheck contrib application
Date: 2021-03-16 19:31:42
Message-ID: 4F6D1038-ECC7-4EDF-B3D4-36E3240FCBBF@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Mar 16, 2021, at 11:40 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Tue, Mar 16, 2021 at 2:22 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I'm circling back around to the idea that amcheck is trying to
>> validate TOAST references that are already dead, and it's getting
>> burnt because something-or-other has already removed the toast
>> rows, though not the referencing datums. That's legal behavior
>> once the rows are marked dead. Upthread it was claimed that
>> amcheck isn't doing that, but this looks like a smoking gun to me.
>
> I think this theory has some legs. From check_tuple_header_and_visibilty():
>
> else if (!(infomask & HEAP_XMAX_COMMITTED))
> return true; /*
> HEAPTUPLE_DELETE_IN_PROGRESS or
> *
> HEAPTUPLE_LIVE */
> else
> return false; /*
> HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
> }
> return true; /* not dead */
> }
>
> That first case looks wrong to me. Don't we need to call
> get_xid_status() here, Mark? As coded, it seems that if the xmin is ok
> and the xmax is marked committed, we consider the tuple checkable. The
> comment says it must be HEAPTUPLE_DELETE_IN_PROGRESS or
> HEAPTUPLE_LIVE, but it seems to me that if the actual answer is either
> HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD depending on whether the
> xmax is all-visible. And in the second case the comment says it's
> either HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD, but I think in that
> case it's either HEAPTUPLE_DELETE_IN_PROGRESS or
> HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD, depending on the XID
> status.
>
> Another thought here is that it's probably not wicked smart to be
> relying on the hint bits to match the actual status of the tuple in
> cases of corruption. Maybe we should be warning about tuples that are
> have xmin or xmax flagged as committed or invalid when in fact clog
> disagrees. That's not a particularly uncommon case, and it's hard to
> check.

This code was not committed as part of the recent pg_amcheck work, but longer ago, and I'm having trouble reconstructing exactly why it was written that way.

Changing check_tuple_header_and_visibilty() fixes the regression test and also manual tests against the "regression" database that I've been using. I'd like to ponder the changes a while longer before I post, but the fact that these changes fix the tests seems to add credibility to this theory.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-03-16 19:52:09 Re: pg_amcheck contrib application
Previous Message Robert Haas 2021-03-16 19:31:01 Re: [HACKERS] Custom compression methods