Re: BUG #14680: startup process on standby encounter a deadlock of TwoPhaseStateLock when redo 2PC xlog

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: wangchuanting(at)huawei(dot)com
Cc: PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: BUG #14680: startup process on standby encounter a deadlock of TwoPhaseStateLock when redo 2PC xlog
Date: 2017-06-01 08:07:53
Message-ID: CAB7nPqQE1HLX3dksYahGr+rSRfdmuO5ooEvBf+T0u7m4FegSPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Wed, May 31, 2017 at 12:30 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Wed, May 31, 2017 at 6:57 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> wangchuanting(at)huawei(dot)com writes:
>>> startup process on standby encounter a deadlock of TwoPhaseStateLock when
>>> redo 2PC xlog.
>>
>> Please provide an example of how to get into this state.
>
> That would help. Are you seeing in the logs something like "removing
> future two-phase state from memory for XXX" or "removing stale
> two-phase state from shared memory for XXX"?
>
> Even with that, the light-weight lock sequence taken in those code
> paths look definitely wrong to me, we should not take twice
> TwoPhaseStateLock in the same code path. I think that we should remove
> the lock acquisitions in RemoveGXact() and PrepareRedoRemove, and then
> upgrade the locks of PrescanPreparedTransactions() and
> StandbyRecoverPreparedTransactions() to be exclusive. We still need to
> keep a lock as CheckPointTwoPhase() may still be triggered by the
> checkpoint. Tom, what do you think?

Attached is what I was thinking about for reference. I just came back
from a long flight and I am pretty tired, so my brain may have missed
something. I'll take again a look at this issue on Monday, an open
item has been added for now.
--
Michael

Attachment Content-Type Size
2pc-redo-lwlock-fix.patch application/octet-stream 4.3 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message ray.warren 2017-06-01 08:11:04 BUG #14683: *** glibc detected *** SELECT: double free or corruption
Previous Message wangchuanting 2017-06-01 07:11:08 Re: BUG #14680: startup process on standby encounter a deadlock of TwoPhaseStateLock when redo 2PC xlog

Browse pgsql-hackers by date

  From Date Subject
Next Message Kuntal Ghosh 2017-06-01 08:44:59 Re: "create publication..all tables" ignore 'partition not supported' error
Previous Message Michael Paquier 2017-06-01 07:36:03 Re: TAP: allow overriding PostgresNode in get_new_node