RE: Slow catchup of 2PC (twophase) transactions on replica in LR

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>
Cc: Давыдов Виталий <v(dot)davydov(at)postgrespro(dot)ru>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Date: 2024-04-15 07:57:49
Message-ID: OSBPR01MB2552707A847936E6803CFAA5F5092@OSBPR01MB2552.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Amit,

> Vitaly, does the minimal solution provided by the proposed patch
> (Allow to alter two_phase option of a subscriber provided no
> uncommitted
> prepared transactions are pending on that subscription.) address your use case?

I think we do not have to handle cases which there are prepared transactions on
publisher/subscriber, as the first step. It leads additional complexity and we
do not have smarter solutions, especially for problem 2.
IIUC it meets the Vitaly's condition, right?

> > 1. While toggling two_phase from true to false, we could probably get a list of
> prepared transactions for this subscriber id and rollback/abort the prepared
> transactions. This will allow the transactions to be re-applied like a normal
> transaction when the commit comes. Alternatively, if this isn't appropriate doing it
> in the ALTER SUBSCRIPTION context, we could store the xids of all prepared
> transactions of this subscription in a list and when the corresponding xid is being
> committed by the apply worker, prior to commit, we make sure the previously
> prepared transaction is rolled back. But this would add the overhead of checking
> this list every time a transaction is committed by the apply worker.
> >
>
> In the second solution, if you check at the time of commit whether
> there exists a prior prepared transaction then won't we end up
> applying the changes twice? I think we can first try to achieve it at
> the time of Alter Subscription because the other solution can add
> overhead at each commit?

Yeah, at least the second solution might be problematic. I prototyped
the first one and worked well. However, to make the feature more consistent,
it is prohibit to exist prepared transactions on subscriber for now.
We can ease based on the requirement.

> > 2. No solution yet.
> >
>
> One naive idea is that on the publisher we can remember whether the
> prepare has been sent and if so then only send commit_prepared,
> otherwise send the entire transaction. On the subscriber-side, we
> somehow, need to ensure before applying the first change whether the
> corresponding transaction is already prepared and if so then skip the
> changes and just perform the commit prepared. One drawback of this
> approach is that after restart, the prepare flag wouldn't be saved in
> the memory and we end up sending the entire transaction again. One way
> to avoid this overhead is that the publisher before sending the entire
> transaction checks with subscriber whether it has a prepared
> transaction corresponding to the current commit. I understand that
> this is not a good idea even if it works but I don't have any better
> ideas. What do you think?

I considered but not sure it is good to add such mechanism. Your idea requires
additional wait-loop, which might lead bugs and unexpected behavior. And it may
degrade the performance based on the network environment.
As for the another solution (worker sends a list of prepared transactions), it
is also not so good because list of prepared transactions may be huge.

Based on above, I think we can reject the case for now.

FYI - We also considered the idea which walsender waits until all prepared transactions
are resolved before decoding and sending changes, but it did not work well
- the restarted walsender sent only COMMIT PREPARED record for transactions which
have been prepared before disabling the subscription. This happened because
1) if the two_phase option of slots is false, the confirmed_flush can be ahead of
PREPARE record, and
2) after the altering and restarting, start_decoding_at becomes same as
confirmed_flush and records behind this won't be decoded.

> > 3. We could mandate that the altering of two_phase state only be done after
> disabling the subscription, just like how it is handled for failover option.
> >
>
> makes sense.

OK, this spec was added.

According to above, I updated the patch with Ajin.
0001 - extends ALTER SUBSCRIPTION statement. A tab-completion was added.
0002 - mandates the subscription has been disabled. Since no need to change
AtEOXact_ApplyLauncher(), the change is reverted.
If no objections, this can be included to 0001.
0003 - checks whether there are transactions prepared by the worker. If found,
rejects the ALTER SUBSCRIPTION command.
0004 - checks whether there are transactions prepared on publisher. The backend
connects to the publisher and confirms it. If found, rejects the ALTER
SUBSCRIPTION command.
0005 - adds TAP test for it.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Attachment Content-Type Size
v3-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPT.patch application/octet-stream 17.7 KB
v3-0002-Mandate-the-subscription-has-been-disabled.patch application/octet-stream 6.2 KB
v3-0003-Prohibit-altering-from-true-to-false-if-there-are.patch application/octet-stream 3.6 KB
v3-0004-Prohibit-altering-from-false-to-true-if-there-are.patch application/octet-stream 4.9 KB
v3-0005-Add-TAP-tests-for-altering-two_phase-option.patch application/octet-stream 3.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2024-04-15 08:00:00 Re: Parallel CREATE INDEX for BRIN indexes
Previous Message Peter Eisentraut 2024-04-15 07:43:08 Re: Catalog domain not-null constraints