Re: raise ERROR between EndPrepare and PostPrepare_Locks causes ROLLBACK 2pc PAINC

From: Andy Fan <zhihuifan1213(at)163(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: raise ERROR between EndPrepare and PostPrepare_Locks causes ROLLBACK 2pc PAINC
Date: 2026-03-25 03:27:10
Message-ID: 874im44mi9.fsf@163.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Michael Paquier <michael(at)paquier(dot)xyz> writes:

Hi,

Thanks for showing interests for this topic!

> On Wed, Mar 25, 2026 at 08:39:07AM +0800, Andy Fan wrote:
>> I found a similar but not exactly same case at 2014 [1] which
>> might be helpful to recall a boarder understanding on this area.
>>
>> [1] https://www.postgresql.org/message-id/534AF601.1030007%40vmware.com
>
> Incorrect shared state when an ERROR happens at an arbitrary location
> is usually bad, yes.
>
> For this one, your suggestion of delaying the end of the critical
> section started at StartPrepare() and ending in EndPrepare() is not an
> acceptable solution as far as I can see, unfortunately: it would mean
> doing a SyncRepWaitForLSN() while in a critical section, and I doubt
> we'd want to do that.

I do have a more general question: (Q1). what is critical section
designed to? (Q2). What is the badness if we put more code into it
except the ERROR->PANIC logic?

Then (Q3) will SyncRepWaitForLSN be possible to raise ERROR? if no for
now and possible in future, then my Q2 raise.

> Anyway, I doubt that this one is worth caring for. The current locking
> 2PC scheme means, as far as I remember, that it is not really possible
> to interact with an external command in a specific session between
> the EndPrepare() and the PostPrepare_Locks()
> calls.

Then Q3 comes. The deeper answer might be Q2 or Q1.

> To put it in other words, let's imagine that we use a breakpoint
> between these two calls (or a wait injection point if you automate
> that). Is it possible for a second backend to mess with the state of
> the first backend waiting until its locks are transfered to the dummy
> PGPROC entry? That's what the 2014 thread is about: there was a race
> condition reachable between two sessions.

This is true, so the issue 2014 thread is more critical than the
current one and which has been fixed.

> If the answer to this question is yes, I'd agree that this is
> something that deserves a closer lookup.

Generally yes.. But I can't stop thinking the Q3 -> Q2 -> Q1 when I want
to accept this asnwer.

> And before you ask: attempting to interact with a 2PC
> state from a second session with a first session waiting between these
> two points would not work: the 2PC entry is locked, cleaned up after
> EndPrepare() and PostPrepare_Locks() at PostPrepare_Twophase().
> Trying to request an access to this entry fails, as the first backend
> is marked as locking it. A second backend attempting to lock it would
> fail, complaining that the 2PC entry with a GXID is "busy".

I can understand what you are saying now, but what does it make
difference on the above case?

> SyncRepWaitForLSN() would be a problematic pattern between the
> EndPrepare() and the PostPrepare_Locks(), but we never ERROR there on
> purpose: even if we cancel while waiting for a transaction commit we'd
> just get a WARNING, meaning that we'd be able to transfer our locks
> anyway.

again Q2 -> Q3.

> Or perhaps you have a realistic scenario where it is possible to mess
> up with the shared state, outside a elog(ERROR) forced between these
> two points?

No really. I just inject some exception on some predefined places. I even
don't know why people defined this places before. As for me, I prefer
to know MORE design points for the CRITIAL SECTION besides what I side
before.

--
Best Regards
Andy Fan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2026-03-25 03:28:23 Re: Reduce log level of some logical decoding messages to DEBUG1
Previous Message Bertrand Drouvot 2026-03-25 03:25:07 Re: relfilenode statistics