Re: Re: BUG #14680: startup process on standby encounter a deadlock of TwoPhaseStateLock when redo 2PC xlog

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: wangchuanting <wangchuanting(at)huawei(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Re: BUG #14680: startup process on standby encounter a deadlock of TwoPhaseStateLock when redo 2PC xlog
Date: 2017-06-12 22:49:25
Message-ID: 20170612224925.h3tiogbfj4cjeobu@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Michael Paquier wrote:
> On Sun, Jun 11, 2017 at 12:24 PM, Alvaro Herrera
> <alvherre(at)2ndquadrant(dot)com> wrote:

> I have reworked the comment as follows:
> /*
> - * Don't need a lock in the recovery phase.
> + * It is fine to access TwoPhaseState without a lock here: recovery is
> + * finished (so if we were a standby, there's no master that can prepare
> + * transactions anymore), and we haven't yet set WAL as open for writes,
> + * so local existing backends, if any, cannot do so either. We could use a
> + * coding pattern similar to restoreTwoPhaseData, i.e., run the whole loop
> + * with the lock held; but this loop is far more complex, so instead only
> + * grab the lock while calling the low-level functions working directly on
> + * manipulating the two-phase state data. Functions working directly on
> + * PGPROC entries linked with the two-phase transaction work with other
> + * types of locks but we don't want to complicate that more than necessary.
> */

Hmm. Honestly I don't like the final sentence you added. I find it
more confusing than useful, because it doesn't explain what these "other
types of locks" are, or why we care.

However, I found out that this rationale is likely not true, because the
checkpointer may be running concurrently with this code from startup
process, and checkpointer does process 2PC data. Maybe there are other
reasons why there's no live bug here, but it looks wrong (I didn't try
to reproduce a problem).

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2017-06-12 23:39:05 Re: Re: BUG #14680: startup process on standby encounter a deadlock of TwoPhaseStateLock when redo 2PC xlog
Previous Message girgen 2017-06-12 21:52:51 BUG #14702: Streaming replication broken after server closed connection unexpectedly

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-06-12 23:00:02 Re: Relpartbound, toasting and pg_class
Previous Message Peter Geoghegan 2017-06-12 22:44:08 Re: GSOC'17 project introduction: Parallel COPY execution with errors handling