"failed to find parent tuple for heap-only tuple" error as an ERRCODE_DATA_CORRUPTION ereport()

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: "failed to find parent tuple for heap-only tuple" error as an ERRCODE_DATA_CORRUPTION ereport()
Date: 2017-12-15 22:31:50
Message-ID: CAH2-Wzmn4-Pg-UGFwyuyK-wiTih9j32pwg_7T9iwqXpAUZr=Mg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Commit d70cf811, from 2014, promoted an Assert() within
IndexBuildHeapScan() to a "can't happen" elog() error, in order to
detect when a parent tuple cannot be found for some heap-only tuple --
if this happens, then it indicates corruption. I think that we should
make it a full ereport(), with an errcode of ERRCODE_DATA_CORRUPTION,
to match what Andres just added to code that deals with freezing (he
promoted Assert()s to errors, just like the 2014 commit, though he
went as far as making them ereport()s to begin with). Attached patch
does this.

I propose a backpatch to 9.3, partially for the sake of tools like
amcheck, where users may only be on the lookout for
ERRCODE_DATA_CORRUPTION and ERRCODE_INDEX_CORRUPTED.

FWIW, an old MultiXact/recovery bug, alluded to by the commit message
of d70cf811 [1] (and fixed by 6bfa88acd) was the cause of some déjà vu
for me while looking into the "freeze the dead" issues. Because the
enhanced amcheck [2] actually raised this error when I went to verify
the first "freeze the dead" bugfix [3], it's clearly effective as a
test for certain types of corruption. If CREATE
INDEX/IndexBuildHeapScan() didn't already perform this check, then it
would probably be necessary for amcheck to implement it on its own.
What heap_get_root_tuples() does for us here is ideally suited to
finding inconsistencies in HOT chains, because it matches xmin against
xmax, looks at line pointer bits/redirects, and consults pg_multixact
if necessary. The only thing that it *doesn't* do is make sure that
hint bits accurately reflect what it says in the CLOG -- we'll need to
find another way to do that, by directly targeting heap relations with
their own function. In short, it does an awful lot for tools like
amcheck, and I want to make sure that we get the full benefit of that.

[1] https://www.postgresql.org/message-id/CAM3SWZTMQiCi5PV5OWHb+bYkUcnCk=O67w0cSswPvV7XfUcU5g@mail.gmail.com
[2] https://github.com/petergeoghegan/amcheck#optional-heapallindexed-verification
[3] https://postsgr.es/m/CAH2-Wznm4rCrhFAiwKPWTpEw2bXDtgROZK7jWWGucXeH3D1fmA@mail.gmail.com
--
Peter Geoghegan

Attachment Content-Type Size
0001-Promote-HOT-parent-tuple-elog-to-an-ereport.patch text/x-patch 3.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-12-15 22:34:38 Re: [HACKERS] replace GrantObjectType with ObjectType
Previous Message Alvaro Herrera 2017-12-15 22:18:01 Re: [HACKERS] Proposal: Local indexes for partitioned table