Re: Race condition in TransactionIdIsInProgress

From: Andres Freund <andres(at)anarazel(dot)de>
To: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Race condition in TransactionIdIsInProgress
Date: 2022-02-11 19:08:48
Message-ID: 20220211190848.patwql5a5nyoolot@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-02-11 13:48:59 +0000, Simon Riggs wrote:
> On Fri, 11 Feb 2022 at 08:48, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com> wrote:
> >
> > On Fri, 11 Feb 2022 at 06:11, Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > > Looks lik syncrep will make this a lot worse, because it can drastically
> > > increase the window between the TransactionIdCommitTree() and
> > > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. But at
> > > least it might make it easier to write tests exercising this scenario...
> >
> > Agreed
> >
> > TransactionIdIsKnownCompleted(xid) is only broken because the single
> > item cache is set too early in some cases. The single item cache is
> > important for performance, so we just need to be more careful about
> > setting the cache.
>
> Something like this... fix_cachedFetchXid.v1.patch prevents the cache
> being set, but this fails! Not worked out why, yet.

I don't think it's safe to check !TransactionIdKnownNotInProgress() in
TransactionLogFetch(), given that TransactionIdKnownNotInProgress() needs to
do TransactionLogFetch() internally.

> just_remove_TransactionIdIsKnownCompleted_call.v1.patch

I think this might contain a different diff than what the name suggests?

> just removes the known offending call, passes make check, but IMHO
> leaves the same error just as likely by other callers.

Which other callers are you referring to?

To me it seems that the "real" reason behind this specific issue being
nontrivial to fix and behind the frequent error of calling
TransactionIdDidCommit() before TransactionIdIsInProgress() is
that we've really made a mess out of the transaction status determination code
/ API. IMO the original sin is requiring the complicated

if (TransactionIdIsCurrentTransactionId(xid)
...
else if (TransactionIdIsInProgress(xid)
...
else if (TransactionIdDidCommit(xid)
...

dance at pretty much any place checking transaction status. The multiple calls
for the same xid is, I think, what pushed the caching down to the
TransactionLogFetch level.

ISTM that we should introduce something like TransactionIdStatus(xid) that
returns
XACT_STATUS_CURRENT
XACT_STATUS_IN_PROGRESS
XACT_STATUS_COMMITTED
XACT_STATUS_ABORTED
accompanied by a TransactionIdStatusMVCC(xid, snapshot) that checks
XidInMVCCSnapshot() instead of TransactionIdIsInProgress().

I think that'd also make visibility checks faster - we spend a good chunk of
cycles doing unnecessary function calls and repeatedly gathering information.

It's also kind of weird that TransactionIdIsCurrentTransactionId() isn't more
optimized - TransactionIdIsInProgress() knows it doesn't need to check
anything before RecentXmin, but TransactionIdIsCurrentTransactionId()
doesn't. Nor does TransactionIdIsCurrentTransactionId() check if the xid is
smaller than GetTopTransactionIdIfAny() before bsearching through subxids.

Of course anything along these lines would never be backpatchable...

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joseph Koshakow 2022-02-11 19:48:21 Fix overflow in DecodeInterval
Previous Message Justin Pryzby 2022-02-11 18:51:50 Re: support for MERGE