Re: Slow catchup of 2PC (twophase) transactions on replica in LR

From: Ajin Cherian <itsajin(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Давыдов Виталий <v(dot)davydov(at)postgrespro(dot)ru>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Date: 2024-04-05 11:29:29
Message-ID: CAFPTHDa=pJSZ_4dV5DPAOapRSgPcyyUTP0WzGY2Rz_D3-gwraw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 4, 2024 at 4:38 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

>
> I think this would probably be better than the current situation but
> can we think of a solution to allow toggling the value of two_phase
> even when prepared transactions are present? Can you please summarize
> the reason for the problems in doing that and the solutions, if any?
>
> --
> With Regards,
> Amit Kapila.
>

Updated the patch, as it wasn't addressing updating of two-phase in the
remote slot.

Currently the main issue that needs to be handled is the handling of
pending prepared transactions while the two_phase is altered. I see 3
issues with the current approach.

1. Uncommitted prepared transactions when toggling two_phase from true to
false
When two_phase was true, prepared transactions were decoded at PREPARE time
and send to the subscriber, which is then prepared on the subscriber with a
new gid. Once the two_phase is toggled to false, then the COMMIT PREPARED
on the publisher is converted to commit and the entire transaction is
decoded and sent to the subscriber. This will leave the previously
prepared transaction pending.

2. Uncommitted prepared transactions when toggling two_phase form false to
true
When two_phase was false, prepared transactions were ignored and not
decoded at PREPARE time on the publisher. Once the two_phase is toggled to
true, the apply worker and the walsender are restarted and a replication is
restarted from a new "start_decoding_at" LSN. Now, this new
"start_decoding_at" could be past the LSN of the PREPARE record and if so,
the PREPARE record is skipped and not send to the subscriber. Look at
comments in DecodeTXNNeedSkip() for detail. Later when the user issues
COMMIT PREPARED, this is decoded and sent to the subscriber. but there is
no prepared transaction on the subscriber, and this fails because the
corresponding gid of the transaction couldn't be found.

3. While altering the two_phase of the subscription, it is required to also
alter the two_phase field of the slot on the primary. The subscription
cannot remotely alter the two_phase option of the slot when the
subscription is enabled, as the slot is owned by the walsender on the
publisher side.

Possible solutions for the 3 problems:

1. While toggling two_phase from true to false, we could probably get list
of prepared transactions for this subscriber id and rollback/abort the
prepared transactions. This will allow the transactions to be re-applied
like a normal transaction when the commit comes. Alternatively, if this
isn't appropriate doing it in the ALTER SUBSCRIPTION context, we could
store the xids of all prepared transactions of this subscription in a list
and when the corresponding xid is being committed by the apply worker,
prior to commit, we make sure the previously prepared transaction is rolled
back. But this would add the overhead of checking this list every time a
transaction is committed by the apply worker.

2. No solution yet.

3. We could mandate that the altering of two_phase state only be done after
disabling the subscription, just like how it is handled for failover option.
Let me know your thoughts.

regards,
Ajin Cherian
Fujitsu Australia

Attachment Content-Type Size
v2-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPT.patch application/octet-stream 15.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2024-04-05 11:33:11 Re: Synchronizing slots from primary to standby
Previous Message Bertrand Drouvot 2024-04-05 11:01:50 Re: Synchronizing slots from primary to standby