Re: hot standby - merged up to CVS HEAD

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, jd(at)commandprompt(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: hot standby - merged up to CVS HEAD
Date: 2009-08-27 18:08:28
Message-ID: 1251396508.4839.1154.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Mon, 2009-08-17 at 11:19 +0300, Heikki Linnakangas wrote:

> I think there's a race condition in the way LogCurrentRunningXacts() is
> called at the end of checkpoint. This can happen in the master:
>
> 1. Checkpoint starts
> 2. Transaction 123 begins, and does some updates
> 3. Checkpoint ends. LogCurrentRunningXacts() is called.
> 4. LogCurrentRunningXacts() gets the list of currently running
> transactions by calling GetCurrentTransactionData().
> 5. Transaction 123 ends, writing commit record to WAL
> 6. LogCurrentRunningXacts() writes the list of running XIDs to WAL. This
> includes XID 123, since that was still running at step 4.
>
> When that is replayed, ProcArrayUpdateTransactions() will zap the
> unobserved xids array with the list that includes XID 123, even though
> we already saw a commit record for it.

That's not a race condition, but it does make the code more complex. The
issue has been long understood.

I don't think it's acceptable to take and hold both ProcArray and
WALInsertLock. Those are now the two most heavily contended locks on the
system. We have evidence that there are burst delays associated with
various operations on just one of those locks, let alone two.

If you're still doubtful, the problem I've been working on recently is
the point that I overlooked the initial state of the lock table in my
earlier patch. GetRunningTransactionData() also needs to have initial
lock data.

There is no way in hell that I could personally condone holding
ProcArrayLock, WALInsertLock and all of the LockMgrLock partitions at
same time. So we just have to eat the complexity. (No doubt someone will
disagree with my strong language here, but please take it as an
indication of exactly how bad an idea holding multiple locks will be).

Slight timing issues are not too bad really. We just have to be careful
to assume that there is a mismatch in the data and must have code to
handle that.

Anyway, I've been working on this problem for some time and continue to
do so.

--
Simon Riggs www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-08-27 18:17:22 Re: pretty print viewdefs
Previous Message Andrew Dunstan 2009-08-27 18:04:00 Re: pretty print viewdefs