Re: BUG #14680: startup process on standby encounter a deadlock of TwoPhaseStateLock when redo 2PC xlog

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: wangchuanting(at)huawei(dot)com
Cc: PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: BUG #14680: startup process on standby encounter a deadlock of TwoPhaseStateLock when redo 2PC xlog
Date: 2017-05-31 19:30:56
Message-ID: CAB7nPqQeOx96RC19STwR4eqgPX-5J8Vbow7n2_3ghfNtM3N+NQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Wed, May 31, 2017 at 6:57 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> wangchuanting(at)huawei(dot)com writes:
>> startup process on standby encounter a deadlock of TwoPhaseStateLock when
>> redo 2PC xlog.
>
> Please provide an example of how to get into this state.

That would help. Are you seeing in the logs something like "removing
future two-phase state from memory for XXX" or "removing stale
two-phase state from shared memory for XXX"?

Even with that, the light-weight lock sequence taken in those code
paths look definitely wrong to me, we should not take twice
TwoPhaseStateLock in the same code path. I think that we should remove
the lock acquisitions in RemoveGXact() and PrepareRedoRemove, and then
upgrade the locks of PrescanPreparedTransactions() and
StandbyRecoverPreparedTransactions() to be exclusive. We still need to
keep a lock as CheckPointTwoPhase() may still be triggered by the
checkpoint. Tom, what do you think?
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2017-05-31 21:24:38 Re: [BUGS] Concurrent ALTER SEQUENCE RESTART Regression
Previous Message Tom Lane 2017-05-31 13:57:27 Re: BUG #14680: startup process on standby encounter a deadlock of TwoPhaseStateLock when redo 2PC xlog

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2017-05-31 19:40:22 pg_class.relpartbound definition overly brittle
Previous Message Magnus Hagander 2017-05-31 18:57:18 Re: Re: [GENERAL] pg_basebackup error: replication slot "pg_basebackup_2194" already exists