Re: Race condition in TransactionIdIsInProgress

From: Andres Freund <andres(at)anarazel(dot)de>
To: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Race condition in TransactionIdIsInProgress
Date: 2022-02-12 21:50:35
Message-ID: 20220212215035.ge2elo73ykxhofuo@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-02-12 13:25:58 +0000, Simon Riggs wrote:
> On Fri, 11 Feb 2022 at 19:08, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> > On 2022-02-11 13:48:59 +0000, Simon Riggs wrote:
> > > On Fri, 11 Feb 2022 at 08:48, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com> wrote:
> > > >
> > > > On Fri, 11 Feb 2022 at 06:11, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > >
> > > > > Looks lik syncrep will make this a lot worse, because it can drastically
> > > > > increase the window between the TransactionIdCommitTree() and
> > > > > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. But at
> > > > > least it might make it easier to write tests exercising this scenario...
> > > >
> > > > Agreed
> > > >
> > > > TransactionIdIsKnownCompleted(xid) is only broken because the single
> > > > item cache is set too early in some cases. The single item cache is
> > > > important for performance, so we just need to be more careful about
> > > > setting the cache.
> > >
> > > Something like this... fix_cachedFetchXid.v1.patch prevents the cache
> > > being set, but this fails! Not worked out why, yet.
> >
> > I don't think it's safe to check !TransactionIdKnownNotInProgress() in
> > TransactionLogFetch(), given that TransactionIdKnownNotInProgress() needs to
> > do TransactionLogFetch() internally.
>
> That's not correct because you're confusing
> TransactionIdKnownNotInProgress(), which is a new function introduced
> by the patch, with the existing function
> TransactionIdIsKnownCompleted().

I don't think so. This call of the new TransactionIdKnownNotInProgress()

> --- a/src/backend/access/transam/transam.c
> +++ b/src/backend/access/transam/transam.c
> @@ -73,6 +73,14 @@ TransactionLogFetch(TransactionId transactionId)
> return TRANSACTION_STATUS_ABORTED;
> }
>
> + /*
> + * Safeguard that we have called TransactionIsIdInProgress() before
> + * checking commit log manager, to ensure that we do not cache the
> + * result until the xid is no longer in the procarray at eoxact.
> + */
> + if (!TransactionIdKnownNotInProgress(transactionId))
> + return TRANSACTION_STATUS_IN_PROGRESS;
> +
> /*
> * Get the transaction status.
> */

isn't right. Consider what happens to TransactionIdIsInProgress(x)'s call to
TransactionIdDidAbort(x) at "step 4". TransactionIdDidAbort(x) will call
TransactionLogFetch(x). cachedXidIsNotInProgress isn't yet set to x, so
TransactionLogFetch() returns TRANSACTION_STATUS_IN_PROGRESS. Even if the
(sub)transaction aborted.

Which explains why your first patch doesn't work.

> > > just_remove_TransactionIdIsKnownCompleted_call.v1.patch
> >
> > I think this might contain a different diff than what the name suggests?
>
> Not at all, please don't be confused by the name. The patch removes
> the call to TransactionIdIsKnownCompleted() from
> TransactionIdIsInProgress().

I guess for me "just remove" doesn't include adding a new cache...

> I'm not sure it is possible to remove TransactionIdIsKnownCompleted()
> in backbranches.

I think it'd be fine if we needed to. Or we could just make it always return
false or such.

> > > just removes the known offending call, passes make check, but IMHO
> > > leaves the same error just as likely by other callers.
> >
> > Which other callers are you referring to?
>
> I don't know of any, but we can't just remove a function in a
> backbranch, can we?

We have in the past, if it's a "sufficiently internal" function.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-02-12 22:03:03 Re: buildfarm warnings
Previous Message Tom Lane 2022-02-12 21:42:03 Re: buildfarm warnings