Hot standby, race condition between recovery snapshot and commit

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Hot standby, race condition between recovery snapshot and commit
Date: 2009-11-14 12:59:16
Message-ID: 4AFEA9A4.5060808@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

There's a race condition between transaction commit and
GetRunningTransactionData(). If GetRunningTransactionData() runs between
the RecordTransactionCommit() and ProcArrayEndTransaction() calls in
CommitTransaction():

> /*
> * Here is where we really truly commit.
> */
> latestXid = RecordTransactionCommit(false);
>
> TRACE_POSTGRESQL_TRANSACTION_COMMIT(MyProc->lxid);
>
> /*
> * Let others know about no transaction in progress by me. Note that this
> * must be done _before_ releasing locks we hold and _after_
> * RecordTransactionCommit.
> */
> ProcArrayEndTransaction(MyProc, latestXid);

The running-xacts snapshot will include the transaction that's just
committing, but the commit record will be before the running-xacts WAL
record. If standby initializes transaction tracking from that
running-xacts record, it will consider the just-committed transactions
as still in-progress until the next running-xact record (at next
checkpoint).

I can't see any obvious way around that. We could have transaction
commit acquire the new RecoveryInfoLock across those two calls, but I'd
like to avoid putting any extra overhead into such a critical path.

Hmm, actually ProcArrayApplyRecoveryInfo() could check every xid in the
running-xacts record against clog. If it's marked as finished in clog
already (because we already saw the commit/abort record before the
running-xacts record), we know it's not running after all.

Because of the sequence that commit removes entry from procarray and
releases locks, it also seems possible for GetRunningTransactionsData()
to acquire a snapshot that contains an AccessExclusiveLock for a
transaction, but that XID is not listed as running in the XID list. That
sounds like trouble too.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2009-11-14 13:06:01 Re: UTF8 with BOM support in psql
Previous Message Robert Haas 2009-11-14 12:35:09 Re: Patch committers