Re: Race condition between PREPARE TRANSACTION and COMMIT PREPARED (was Re: Problem with txid_snapshot_in/out() functionality)

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jan Wieck <jan(at)wi3ck(dot)info>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Race condition between PREPARE TRANSACTION and COMMIT PREPARED (was Re: Problem with txid_snapshot_in/out() functionality)
Date: 2014-05-15 14:21:28
Message-ID: 5374CD68.5040304@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 05/06/2014 02:44 PM, Andres Freund wrote:
> On 2014-05-05 13:41:00 +0300, Heikki Linnakangas wrote:
>> +/*
>> + * Exit hook to unlock the global transaction entry we're working on.
>> + */
>> +static void
>> +AtProcExit_Twophase(int code, Datum arg)
>> +{
>> + /* same logic as abort */
>> + AtAbort_Twophase();
>> +}
>> +
>> +/*
>> + * Abort hook to unlock the global transaction entry we're working on.
>> + */
>> +void
>> +AtAbort_Twophase(void)
>> +{
>> + if (MyLockedGxact == NULL)
>> + return;
>> +
>> + /*
>> + * If we were in process of preparing the transaction, but haven't
>> + * written the WAL record yet, remove the global transaction entry.
>> + * Same if we are in the process of finishing an already-prepared
>> + * transaction, and fail after having already written the WAL 2nd
>> + * phase commit or rollback record.
>> + *
>> + * After that it's too late to abort, so just unlock the GlobalTransaction
>> + * entry. We might not have transfered all locks and other state to the
>> + * prepared transaction yet, so this is a bit bogus, but it's the best we
>> + * can do.
>> + */
>> + if (!MyLockedGxact->valid)
>> + {
>> + RemoveGXact(MyLockedGxact);
>> + }
>> + else
>> + {
>> + LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
>> +
>> + MyLockedGxact->locking_backend = InvalidBackendId;
>> +
>> + LWLockRelease(TwoPhaseStateLock);
>> + }
>> + MyLockedGxact = NULL;
>> +}
>
> Is it guaranteed that all paths have called LWLockReleaseAll()
> before calling the proc exit hooks? Otherwise we might end up waiting
> for ourselves...

Hmm. AbortTransaction() will release locks before we get here, but the
before_shmem_exit() callpath will not. So an elog(FATAL), while holding
TwoPhaseStateLock would cause us to deadlock with ourself. But there are
no such elogs.

I copied this design from async.c, which is quite similar, so if there's
a problem that ought to be fixed too. And there are other more
complicated before_shmem callbacks that worry me more, like
createdb_failure_callback(). But I think they're all all right.

>> /*
>> * MarkAsPreparing
>> @@ -261,29 +329,15 @@ MarkAsPreparing(TransactionId xid, const char *gid,
>> errmsg("prepared transactions are disabled"),
>> errhint("Set max_prepared_transactions to a nonzero value.")));
>>
>> - LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
>> -
>> - /*
>> - * First, find and recycle any gxacts that failed during prepare. We do
>> - * this partly to ensure we don't mistakenly say their GIDs are still
>> - * reserved, and partly so we don't fail on out-of-slots unnecessarily.
>> - */
>> - for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
>> + /* on first call, register the exit hook */
>> + if (!twophaseExitRegistered)
>> {
>> - gxact = TwoPhaseState->prepXacts[i];
>> - if (!gxact->valid && !TransactionIdIsActive(gxact->locking_xid))
>> - {
>> - /* It's dead Jim ... remove from the active array */
>> - TwoPhaseState->numPrepXacts--;
>> - TwoPhaseState->prepXacts[i] = TwoPhaseState->prepXacts[TwoPhaseState->numPrepXacts];
>> - /* and put it back in the freelist */
>> - gxact->next = TwoPhaseState->freeGXacts;
>> - TwoPhaseState->freeGXacts = gxact;
>> - /* Back up index count too, so we don't miss scanning one */
>> - i--;
>> - }
>> + before_shmem_exit(AtProcExit_Twophase, 0);
>> + twophaseExitRegistered = true;
>> }
>
> It's not particularly nice to register shmem exit hooks in the middle of
> normal processing because it makes it impossible to use
> cancel_before_shmem_exit() previously registered hooks. I think this
> should be registered at startup, if max_prepared_xacts > 0.

<shrug>. async.c and namespace.c does the same, and it hasn't been a
problem.

I committed this now, but please let me know if you see a concrete
problem with the locks.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-05-15 14:24:25 Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers
Previous Message Arthur Axel 'fREW' Schmidt 2014-05-15 13:35:20 Re: GSOC13 proposal - extend RETURNING syntax