Re: HOT chain validation in verify_heapam()

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Himanshu Upadhyaya <upadhyaya(dot)himanshu(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: HOT chain validation in verify_heapam()
Date: 2022-11-16 17:53:42
Message-ID: CA+TgmoZuTuqTDDz+GeYkpyPFb3bx0Zjw+X86zzLasYG3J4ij+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 16, 2022 at 4:51 AM Himanshu Upadhyaya
<upadhyaya(dot)himanshu(at)gmail(dot)com> wrote:
> yes, got it, have tried to test and it is giving false corruption in case of subtransaction.
> I think a better way to have this check is, we need to check that if pred_xmin is
> aborted then current_xmin should be aborted only. So there is no way that we
> validate corruption with in_progress txid.

Please note that you can't use TransactionIdDidAbort here, because
that will return false for transactions aborted by a crash. You have
to check that it's not in progress and then afterwards check that it's
not committed. Also note that if you check whether it's committed
first and then check whether it's in progress afterwards, there's a
race condition: it might commit just after you verify that it isn't
committed yet, and then it won't be in progress any more and will look
aborted.

I disagree with the idea that we can't check in progress. I think the
checks could look something like this:

pred_in_progress = TransactionIdIsInProgress(pred_xmin);
current_in_progress = TransactionIdIsInProgress(current_xmin);
if (pred_in_progress)
{
if (current_in_progress)
return ok;
// recheck to avoid race condition
if (TransactionIdIsInProgress(pred_xmin))
{
if (TransactionIdDidCommit(current_xmin))
return corruption: predecessor xmin in progress, but
current xmin committed;
else
return corruption: predecessor xmin in progress, but
current xmin aborted;
}
// fallthrough: when we entered this if-block pred_xmin was still
in progress but no longer;
pred_in_progress = false;
}

if (TransactionIdDidCommit(pred_xmin))
return ok;

if (current_in_progress)
return corruption: predecessor xmin aborted, but current xmin in progress;
else if (TransactionIdDidCommit(current_xmin))
return corruption: predecessor xmin aborted, but current xmin committed;

The error messages as phrased here aren't actually what we should use;
they would need rephrasing. But I think, or hope anyway, that the
logic works. I think you basically just need the 1 recheck: if you see
the predecessor xmin in progress and then the current xmin in
progress, you have to go back and check that the predecessor xmin is
still in progress, because otherwise both could have committed or
aborted together in between.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2022-11-16 17:56:11 Re: Small miscellaneous fixes
Previous Message Andres Freund 2022-11-16 17:41:10 Re: [PoC] configurable out of disk space elog level