Re: Slow catchup of 2PC (twophase) transactions on replica in LR

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Ajin Cherian <itsajin(at)gmail(dot)com>
Cc: Давыдов Виталий <v(dot)davydov(at)postgrespro(dot)ru>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Date: 2024-04-10 11:18:26
Message-ID: CAA4eK1K1fSkeK=kc26G5cq87vQG4=1qs_b+no4+ep654SeBy1w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 5, 2024 at 4:59 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Thu, Apr 4, 2024 at 4:38 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>
>>
>> I think this would probably be better than the current situation but
>> can we think of a solution to allow toggling the value of two_phase
>> even when prepared transactions are present? Can you please summarize
>> the reason for the problems in doing that and the solutions, if any?
>>
>
>
> Updated the patch, as it wasn't addressing updating of two-phase in the remote slot.
>

Vitaly, does the minimal solution provided by the proposed patch
(Allow to alter two_phase option of a subscriber provided no
uncommitted
prepared transactions are pending on that subscription.) address your use case?

> Currently the main issue that needs to be handled is the handling of pending prepared transactions while the two_phase is altered. I see 3 issues with the current approach.
>
> 1. Uncommitted prepared transactions when toggling two_phase from true to false
>   When two_phase was true, prepared transactions were decoded at PREPARE time and send to the subscriber, which is then prepared on the subscriber with a new gid. Once the two_phase is toggled to false, then the COMMIT PREPARED on the publisher is converted to commit and the entire transaction is decoded and sent to the subscriber. This will leave the previously prepared transaction pending.
>
> 2. Uncommitted prepared transactions when toggling two_phase form false to true
>   When two_phase was false, prepared transactions were ignored and not decoded at PREPARE time on the publisher. Once the two_phase is toggled to true, the apply worker and the walsender are restarted and a replication is restarted from a new "start_decoding_at" LSN. Now, this new "start_decoding_at" could be past the LSN of the PREPARE record and if so, the PREPARE record is skipped and not send to the subscriber. Look at comments in DecodeTXNNeedSkip() for detail. Later when the user issues COMMIT PREPARED, this is decoded and sent to the subscriber. but there is no prepared transaction on the subscriber, and this fails because the corresponding gid of the transaction couldn't be found.
>
> 3. While altering the two_phase of the subscription, it is required to also alter the two_phase field of the slot on the primary. The subscription cannot remotely alter the two_phase option of the slot when the subscription is enabled, as the slot is owned by the walsender on the publisher side.
>

Thanks for summarizing the reasons for not allowing altering the
two_pc property for a subscription.

> Possible solutions for the 3 problems:
>
> 1. While toggling two_phase from true to false, we could probably get a list of prepared transactions for this subscriber id and rollback/abort the prepared transactions. This will allow the transactions to be re-applied like a normal transaction when the commit comes. Alternatively, if this isn't appropriate doing it in the ALTER SUBSCRIPTION context, we could store the xids of all prepared transactions of this subscription in a list and when the corresponding xid is being committed by the apply worker, prior to commit, we make sure the previously prepared transaction is rolled back. But this would add the overhead of checking this list every time a transaction is committed by the apply worker.
>

In the second solution, if you check at the time of commit whether
there exists a prior prepared transaction then won't we end up
applying the changes twice? I think we can first try to achieve it at
the time of Alter Subscription because the other solution can add
overhead at each commit?

> 2. No solution yet.
>

One naive idea is that on the publisher we can remember whether the
prepare has been sent and if so then only send commit_prepared,
otherwise send the entire transaction. On the subscriber-side, we
somehow, need to ensure before applying the first change whether the
corresponding transaction is already prepared and if so then skip the
changes and just perform the commit prepared. One drawback of this
approach is that after restart, the prepare flag wouldn't be saved in
the memory and we end up sending the entire transaction again. One way
to avoid this overhead is that the publisher before sending the entire
transaction checks with subscriber whether it has a prepared
transaction corresponding to the current commit. I understand that
this is not a good idea even if it works but I don't have any better
ideas. What do you think?

> 3. We could mandate that the altering of two_phase state only be done after disabling the subscription, just like how it is handled for failover option.
>

makes sense.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2024-04-10 11:30:59 Re: postgres_fdw fails because GMT != UTC
Previous Message Alvaro Herrera 2024-04-10 11:01:52 Re: Can't find not null constraint, but \d+ shows that