Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From: Noah Misch <noah(at)leadboat(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data
Date: 2021-09-20 04:41:43
Message-ID: 20210920044143.GA3414844@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Aug 23, 2021 at 10:38:00PM +0500, Andrey Borodin wrote:
> --- a/src/backend/access/transam/twophase.c
> +++ b/src/backend/access/transam/twophase.c
> @@ -459,14 +459,15 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
> proc->pgprocno = gxact->pgprocno;
> SHMQueueElemInit(&(proc->links));
> proc->waitStatus = PROC_WAIT_STATUS_OK;
> - /* We set up the gxact's VXID as InvalidBackendId/XID */
> - proc->lxid = (LocalTransactionId) xid;
> + /* We set up the gxact's VXID as real for CIC purposes */
> + proc->lxid = MyProc->lxid;

This breaks the case where the server restarted after PREPARE TRANSACTION.
MyProc->lxid is 0 in the startup process, and LocalTransactionIdIsValid(0) is
false. I'm attaching a test case addition. Can you repair this?

> proc->xid = xid;
> Assert(proc->xmin == InvalidTransactionId);
> proc->delayChkpt = false;
> proc->statusFlags = 0;
> proc->pid = 0;
> - proc->backendId = InvalidBackendId;
> + /* May be backendId of startup process */
> + proc->backendId = MyBackendId;

Incidentally, MyBackendId of startup process depends on other facts. When
using hot standby, InitRecoveryTransactionEnvironment() sets MyBackendId=1.
Otherwise, including clean startup of a non-standby node, MyBackendId is
InvalidBackendId. This may be harmless. I didn't know about it.

On Tue, Sep 07, 2021 at 11:45:15PM -0700, Noah Misch wrote:
> On Sun, Aug 29, 2021 at 11:38:03PM +0500, Andrey Borodin wrote:
> > > 29 авг. 2021 г., в 23:09, Andres Freund <andres(at)anarazel(dot)de> написал(а):
> > >>> It seems like it's going to add a substantial amount of work even when
> > >>> no 2PC xacts are involved?
> > >> Only if 2PCs are enabled.
> > >
> > > I don't think that's good enough. Plenty of systems have 2PC enabled but very
> > > few if any transactions end up as 2PC ones.
>
> > Best optimisation I can imagine here is to gather all vxids with unknown xids and search for them in one call to TwoPhaseGetXidByVXid() with one LWLockAcquire(TwoPhaseStateLock, LW_SHARED).
> >
> > Does it worth the complexity?
>
> https://www.postgresql.org/search/?m=1&q=TwoPhaseStateLock&l=&d=-1&s=r
> suggests this is the first postgresql.org discussion of TwoPhaseStateLock as a
> bottleneck. Nonetheless, if Andres Freund finds it's worth the complexity,
> then I'm content with it. I'd certainly expect some performance benefit.
> Andres, what do you think?

A few more benefits (beyond lock contention) come to mind:

- Looking at the three VirtualXactLock() callers, waiting for final
disposition of prepared transactions is necessary for
WaitForLockersMultiple(), disadvantageous for WaitForOlderSnapshots(), and
dead code for ResolveRecoveryConflictWithVirtualXIDs(). In
WaitForOlderSnapshots(), PREPARE is as good as COMMIT/ABORT, because a
prepared transaction won't do further database reads. Waiting on the
prepared transaction there could give CIC an arbitrarily-long, needless
delay. ResolveRecoveryConflictWithVirtualXIDs() will never wait on a
prepared transaction, because prepared transactions hold no locks during
recovery. (If a prepared transaction originally acquired
AccessExclusiveLock, the startup process holds that lock on its behalf.)
Coordinating the XID search at a higher layer would let us change
WaitForLockersMultiple() without changing the others.

- v13 WaitPreparedXact() experiences starvation when a steady stream of
prepared transactions have the same VXID. Since VXID reuse entails
reconnecting, starvation will be unnoticeable in systems that follow best
practices around connection lifespan. The 2021-08-23 patch version didn't
have that hazard.

None of those benefits clearly justify the complexity, but they're relevant to
the decision.

Attachment Content-Type Size
test-2pc-wait-after-recovery-v0.patch text/plain 1.3 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2021-09-20 06:14:56 BUG #17196: old_snapshot_threshold is not honored if there is a transaction
Previous Message Tom Lane 2021-09-19 16:03:04 Re: Query planning on partitioned table causes postgres 13.4 to consume all memory