pgsql: Fix bugs in the hot standby known-assigned-xids tracking logic.

From: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Fix bugs in the hot standby known-assigned-xids tracking logic.
Date: 2010-12-07 08:41:26
Message-ID: E1PPt70-0007In-Nj@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Fix bugs in the hot standby known-assigned-xids tracking logic. If there's
an old transaction running in the master, and a lot of transactions have
started and finished since, and a WAL-record is written in the gap between
the creating the running-xacts snapshot and WAL-logging it, recovery will fail
with "too many KnownAssignedXids" error. This bug was reported by
Joachim Wieland on Nov 19th.

In the same scenario, when fewer transactions have started so that all the
xids fit in KnownAssignedXids despite the first bug, a more serious bug
arises. We incorrectly initialize the clog code with the oldest still running
transaction, and when we see the WAL record belonging to a transaction with
an XID larger than one that committed already before the checkpoint we're
recovering from, we zero the clog page containing the already committed
transaction, leading to data loss.

In hindsight, trying to track xids in the known-assigned-xids array before
seeing the running-xacts record was too complicated. To fix that, hold
XidGenLock while the running-xacts snapshot is taken and WAL-logged. That
ensures that no transaction can begin or end in that gap, so that in recvoery
we know that the snapshot contains all transactions running at that point in
WAL.

Branch
------
REL9_0_STABLE

Details
-------
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=799d0b4b9ede51c629149185e4058c52117cd231

Modified Files
--------------
src/backend/access/transam/xlog.c | 2 -
src/backend/storage/ipc/procarray.c | 134 +++++++++++-----------------------
src/backend/storage/ipc/standby.c | 11 +--
src/include/storage/procarray.h | 1 -
4 files changed, 47 insertions(+), 101 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message User Mhasegawa 2010-12-08 06:47:47 pgbulkload - pgbulkload: Fix a bug that avoid deadlock in case of
Previous Message Tom Lane 2010-12-07 03:59:34 pgsql: Add a stack overflow check to copyObject().