Re: HOT chain validation in verify_heapam()

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Himanshu Upadhyaya <upadhyaya(dot)himanshu(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Subject: Re: HOT chain validation in verify_heapam()
Date: 2023-03-22 21:41:50
Message-ID: CAH2-Wzk8OYkca7af+dGNGdMATn5d_dqZEsk=e45BndvjQGcDuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 22, 2023 at 2:14 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > Currently the new verify_heapam() follows ctid chains when XMAX_INVALID is set
> > and expects to find an item it can dereference - but I don't think that's
> > something we can rely on: Afaics HOT pruning can break chains, but doesn't
> > reset xmax.
>
> I think that we need two passes to be completely thorough. An initial
> pass, that works pretty much as-is, plus a second pass that locates
> any orphaned heap-only tuples -- heap-only tuples that were not deemed
> part of a valid HOT chain during the first pass. These remaining
> orphaned heap-only tuples should be verified as having come from
> now-aborted transactions (they should definitely be fully DEAD) --
> otherwise we have corruption.

I see that there is a second pass over the heap page in
verify_heapam(), in fact. Kind of. I'm referring to this loop:

/*
* An update chain can start either with a non-heap-only tuple or with
* a redirect line pointer, but not with a heap-only tuple.
*
* (This check is in a separate loop because we need the predecessor
* array to be fully populated before we can perform it.)
*/
for (ctx.offnum = FirstOffsetNumber;
ctx.offnum <= maxoff;
ctx.offnum = OffsetNumberNext(ctx.offnum))
{
if (xmin_commit_status_ok[ctx.offnum] &&
(xmin_commit_status[ctx.offnum] == XID_COMMITTED ||
xmin_commit_status[ctx.offnum] == XID_IN_PROGRESS) &&
predecessor[ctx.offnum] == InvalidOffsetNumber)
{
ItemId curr_lp;

curr_lp = PageGetItemId(ctx.page, ctx.offnum);
if (!ItemIdIsRedirected(curr_lp))
{
HeapTupleHeader curr_htup;

curr_htup = (HeapTupleHeader)
PageGetItem(ctx.page, curr_lp);
if (HeapTupleHeaderIsHeapOnly(curr_htup))
report_corruption(&ctx,
psprintf("tuple is root of
chain but is marked as heap-only tuple"));
}
}
}

However, this "second pass over page" loop has roughly the same
problem as the nearby HeapTupleHeaderIsHotUpdated() coding pattern: it
doesn't account for the fact that a tuple whose xmin was
XID_IN_PROGRESS a little earlier on may not be in that state once we
reach the second pass loop. Concurrent transaction abort needs to be
accounted for. The loop needs to recheck xmin status, at least in the
initially-XID_IN_PROGRESS-xmin case.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2023-03-22 21:50:07 Re: Options to rowwise persist result of stable/immutable function with RECORD result
Previous Message Imseih (AWS), Sami 2023-03-22 21:35:23 Re: [BUG] pg_stat_statements and extended query protocol