From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndQuadrant(dot)com> |
Cc: | Petr Jelinek <petr(at)2ndquadrant(dot)com> |
Subject: | Potential hot-standby bug around xacts committed but in xl_running_xacts |
Date: | 2017-05-01 20:38:48 |
Message-ID: | 20170501203848.eptgwp6xmesxq23u@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
The thread below http://archives.postgresql.org/message-id/f37e975c-908f-858e-707f-058d3b1eb214%402ndquadrant.com
describes an issue in logical decoding that arises because
xl_running_xacts' contents aren't necessarily coherent with the contents
of the WAL, because RecordTransactionCommit() / RecordTransactionAbort()
don't have any interlock against the procarray. That means
xl_running_xacts can contain transactions assumed to be running, that
already have their commit/abort records WAL logged.
I think that's not just problematic in logical decoding, but also
Hot-Standby. Consider the following:
ProcArrayApplyRecoveryInfo() gets an xl_running_xacts record that's not
suboverflowed, and thus will change to STANDBY_SNAPSHOT_READY. In that
case it'll populate the KnownAssignedXids machinery using
KnownAssignedXidsAdd().
Once STANDBY_SNAPSHOT_READY, CheckRecoveryConsistency() will signal
postmaster to allow connections.
For HS, a snapshot will be built by GetSnapshotData() using
KnownAssignedXidsGetAndSetXmin(). That in turn uses the transactions
currently known to be running, to populate the snapshot.
Now, if transactions have committed before (in the "earlier LSN" sense)
the xl_running_xacts record, ExpireTreeKnownAssignedTransactionIds() in
xact_redo_commit() will already have run. Which means we'll assume
already committed transactions are still running. In other words, the
snapshot is corrupted.
Luckily this'll self-correct over time, fixed by
ExpireOldKnownAssignedTransactionIds().
Am I missing something that protects against the above scenario?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2017-05-01 20:39:45 | Re: A design for amcheck heapam verification |
Previous Message | Robert Haas | 2017-05-01 19:48:17 | Re: PQhost may return socket dir for network connection |