Re: BUG #14680: startup process on standby encounter a deadlock of TwoPhaseStateLock when redo 2PC xlog

From: Noah Misch <noah(at)leadboat(dot)com>
To: simon(at)2ndquadrant(dot)com
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, wangchuanting(at)huawei(dot)com, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: BUG #14680: startup process on standby encounter a deadlock of TwoPhaseStateLock when redo 2PC xlog
Date: 2017-06-10 06:31:53
Message-ID: 20170610063153.GA1619984@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Thu, Jun 08, 2017 at 11:17:38PM -0700, Noah Misch wrote:
> On Sun, Jun 04, 2017 at 10:24:30PM +0000, Noah Misch wrote:
> > On Thu, Jun 01, 2017 at 01:07:53AM -0700, Michael Paquier wrote:
> > > On Wed, May 31, 2017 at 12:30 PM, Michael Paquier
> > > <michael(dot)paquier(at)gmail(dot)com> wrote:
> > > > On Wed, May 31, 2017 at 6:57 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > > >> wangchuanting(at)huawei(dot)com writes:
> > > >>> startup process on standby encounter a deadlock of TwoPhaseStateLock when
> > > >>> redo 2PC xlog.
> > > >>
> > > >> Please provide an example of how to get into this state.
> > > >
> > > > That would help. Are you seeing in the logs something like "removing
> > > > future two-phase state from memory for XXX" or "removing stale
> > > > two-phase state from shared memory for XXX"?
> > > >
> > > > Even with that, the light-weight lock sequence taken in those code
> > > > paths look definitely wrong to me, we should not take twice
> > > > TwoPhaseStateLock in the same code path. I think that we should remove
> > > > the lock acquisitions in RemoveGXact() and PrepareRedoRemove, and then
> > > > upgrade the locks of PrescanPreparedTransactions() and
> > > > StandbyRecoverPreparedTransactions() to be exclusive. We still need to
> > > > keep a lock as CheckPointTwoPhase() may still be triggered by the
> > > > checkpoint. Tom, what do you think?
> > >
> > > Attached is what I was thinking about for reference. I just came back
> > > from a long flight and I am pretty tired, so my brain may have missed
> > > something. I'll take again a look at this issue on Monday, an open
> > > item has been added for now.
> >
> > [Action required within three days. This is a generic notification.]
> >
> > The above-described topic is currently a PostgreSQL 10 open item. Simon,
> > since you committed the patch believed to have created it, you own this open
> > item. If some other commit is more relevant or if this does not belong as a
> > v10 open item, please let us know. Otherwise, please observe the policy on
> > open item ownership[1] and send a status update within three calendar days of
> > this message. Include a date for your subsequent status update. Testers may
> > discover new open items at any time, and I want to plan to get them all fixed
> > well in advance of shipping v10. Consequently, I will appreciate your efforts
> > toward speedy resolution. Thanks.
> >
> > [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
>
> This PostgreSQL 10 open item is past due for your status update. Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update. Refer to the policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

IMMEDIATE ATTENTION REQUIRED. This PostgreSQL 10 open item is long past due
for your status update. Please reacquaint yourself with the policy on open
item ownership[1] and then reply immediately. If I do not hear from you by
2017-06-11 07:00 UTC, I will transfer this item to release management team
ownership without further notice.

[1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message neil 2017-06-10 20:39:05 BUG #14701: pg_dump fails to dump pg_catalog schema
Previous Message Michael Paquier 2017-06-09 22:43:00 Re: Invalid WAL segment size. Allowed values are 1,2,4,8,16,32,64

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Rijkers 2017-06-10 08:52:00 tablesync.c - comment improvements
Previous Message Jeff Janes 2017-06-10 06:02:19 Re: logical replication: \dRp+ and "for all tables"