Re: Prepared transaction releasing locks before deregistering its GID

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Oleksii Kliukin <alexk(at)hintbits(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Prepared transaction releasing locks before deregistering its GID
Date: 2019-02-19 03:26:40
Message-ID: 20190219032640.GO15532@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 18, 2019 at 05:05:13PM +0100, Oleksii Kliukin wrote:
> That looks like a race condition to me. What happens is that another
> transaction with the name identical to the running one can start and proceed
> to the prepare phase while the original one commits, failing at last instead
> of waiting for the original one to finish.

It took me 50 clients and a bit more than 20 seconds, but I have been
able to reproduce the problem with one error. Thanks for the
reproducer. This is indeed a race condition with 2PC.

> By looking at the source code of FinishPreparedTransaction() I can see the
> RemoveGXact() call, which removes the prepared transaction from
> TwoPhaseState->prepXacts. It is placed at the very end of the function,
> after the post-commit callbacks that clear out the locks held by the
> transaction. Those callbacks are not guarded by the TwoPhaseStateLock,
> resulting in a possibility for a concurrent session to proceed will
> MarkAsPreparing after acquiring the locks released by them.

Hm, I see. Taking a breakpoint just after ProcessRecords() or putting
a sleep there makes the issue plain. The same issue can happen with
predicate locks.

> I couldn’t find any documentation on the expected outcome in cases like
> this, so I assume it might not be a bug, but an undocumented behavior.

If you run two transactions in parallel using your script, the second
transaction would wait at LOCK time until the first transaction
releases its locks with the COMMIT PREPARED.

> Should I go about and produce a patch to put a note in the description of
> commit prepared, or is there any interest in changing the behavior to avoid
> such conflicts altogether (perhaps by holding the lock throughout the
> cleanup phase)?

That's a bug, let's fix it. I agree with your suggestion that we had
better not release the 2PC lock using the callbacks for COMMIT
PREPARED or ROLLBACK PREPARED until the shared memory state is
cleared. At the same time, I think that we should be smarter in the
ordering of the actions: we care about predicate locks here, but not
about the on-disk file removal and the stat counters. One trick is
that the post-commit callback calls TwoPhaseGetDummyProc() which would
try to take TwoPhaseStateLock which needs to be applied so we need
some extra logic to not take a lock in this case. From what I can see
this is older than 9.4 as the removal of the GXACT entry in shared
memory and the post-commit hooks are out of sync for a long time :(

Attached is a patch that fixes the problem for me. Please note the
safety net in TwoPhaseGetGXact().
--
Michael

Attachment Content-Type Size
2pc-commit-locks.patch text/x-diff 5.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-02-19 03:36:48 Re: Reaping Temp tables to avoid XID wraparound
Previous Message Andres Freund 2019-02-19 03:24:20 Re: Speed up transaction completion faster after many relations are accessed in a transaction