Re: Re: BUG #14680: startup process on standby encounter a deadlock of TwoPhaseStateLock when redo 2PC xlog

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: wangchuanting <wangchuanting(at)huawei(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Re: BUG #14680: startup process on standby encounter a deadlock of TwoPhaseStateLock when redo 2PC xlog
Date: 2017-06-11 02:25:36
Message-ID: 20170611022536.goef4cdbmgicye5g@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Michael Paquier wrote:

> > i look carefully on the patch, because of we removed TwoPhaseStateLock
> > lwlock acquire in `RemoveGXact' and let caller held lwlock, so i think:
> > 1. xact_redo also need held lwlock before call PrepareRedoRemove
> > 2. RecoverPreparedTransactions also need held lwlock before call
> > ProcessTwoPhaseBuffer
>
> Thanks for the input. I have reviewed all the code paths that have
> been modified, and strengthened the code with assertions using
> LWLockHeldByMeInMode() to make sure that the correct lock is always
> hold in those code paths. There is actually no point in holding the
> lock in restoreTwoPhaseData(), but as this makes the code less
> consistent with the rest I added one. Also, I have replaced the lock
> acquisition in PrepareRedoAdd() with acquisitions at higher levels,
> and added an assertion in this routine. This makes the 2PC state data
> addition and removal more consistent with each other.

Thanks for the patch -- I agree that the new coding looks better.

I also agree that restoreTwoPhaseData doesn't need to hold the lock,
since we don't expect anything running concurrently, but that it's okay
to hold it and makes the whole thing simpler.

We could try to apply an equivalent rationale to
RecoverPreparedTransactions: even though we have already been running in
HOT standby mode for a while, there's no possibility of concurrent
activity either: since we exited recovery, master cannot write any more
2PC xacts, and we haven't yet set the flag that lets open sessions write
WAL. However, it seems mildly dangerous to run the bottom sections of
the loop with the lock held, since it acquires other lwlocks and
generally does a lot of crap.

Also, let's add an lwlock-is-held assertion to MarkAsPreparingGuts.

BTW: I think that saving one redundant line of code is not worth the
ugliness that it costs in xact_redo, so let's undo that. The patch is a
bit bulky, but the resulting code is simpler.

In short, I propose the attached.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
2pc-redo-lwlock-fix-v4.patch text/plain 12.6 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alvaro Herrera 2017-06-11 03:24:48 Re: Re: BUG #14680: startup process on standby encounter a deadlock of TwoPhaseStateLock when redo 2PC xlog
Previous Message Alvaro Herrera 2017-06-11 00:22:24 Re: Re: BUG #14680: startup process on standby encounter a deadlock of TwoPhaseStateLock when redo 2PC xlog

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2017-06-11 03:24:48 Re: Re: BUG #14680: startup process on standby encounter a deadlock of TwoPhaseStateLock when redo 2PC xlog
Previous Message Tom Lane 2017-06-11 01:31:25 Re: memory fields from getrusage()