Re: Hot Standby 0.2.1

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot Standby 0.2.1
Date: 2009-10-02 20:29:53
Message-ID: 1254515393.4691.45.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Sun, 2009-09-27 at 14:59 +0300, Heikki Linnakangas wrote:
> The locking in smgr_redo_commit and smgr_redo_abort doesn't look right.
> First of all, smgr_redo_abort is not holding XidGenLock and
> ProcArrayLock while modifying ShmemVariableCache->nextXid and
> ShmemVariableCache->latestCompletedXid, respectively, like
> smgr_redo_commit is. Attached patch fixes that.
>
> But I think there's a little race condition in there anyway, as they
> remove the finished xids from known-assigned-xids list by calling
> ExpireTreeKnownAssignedTransactionIds, and
> ShmemVariableCache->latestCompletedXid is updated only after that. We're
> not holding ProcArrayLock between those two steps, like we do during
> normal transaction commit. I'm not sure what kind of issues that can
> lead to, but it seems like it can lead to broken snapshots if someone
> calls GetSnapshotData() between those steps.

I confess I didn't know what you were talking about when you wrote this
and was expecting you to have a coffee and retract. I realise now you
meant xact_redo_commit() rather than smgr_ and it makes sense at last.

I've just committed about half of your patch exactly, but not all.

I've avoided making the changes to
ShmemVariableCache->latestCompletedXid directly and moved the update to
ExpireTreeKnownAssignedTransactionIds() which already had access to
max_xid while holding ProcArrayLock. Hopefully that resolves your
comment about race condition.

Also, I noticed that you removed the line
TransactionIdAdvance(ShmemVariableCache->nextXid);
in xact_redo_abort(). That looks like an error to me, since this follows
neither the comment nor how it is coded in xact_redo_commit(). Let me
know if there was some other reason for that line removal and I'll make
the change again.

--
Simon Riggs www.2ndQuadrant.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2009-10-02 20:32:45 Re: Hot Standby on git
Previous Message decibel 2009-10-02 19:59:19 Re: FSM search modes