RE: Slow catchup of 2PC (twophase) transactions on replica in LR

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>
Cc: Ajin Cherian <itsajin(at)gmail(dot)com>, Давыдов Виталий <v(dot)davydov(at)postgrespro(dot)ru>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Date: 2024-04-16 02:18:29
Message-ID: OSBPR01MB25528F4B0B8178D3AA8DE2BFF5082@OSBPR01MB2552.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Amit,

> > FYI - We also considered the idea which walsender waits until all prepared
> transactions
> > are resolved before decoding and sending changes, but it did not work well
> > - the restarted walsender sent only COMMIT PREPARED record for
> transactions which
> > have been prepared before disabling the subscription. This happened because
> > 1) if the two_phase option of slots is false, the confirmed_flush can be ahead of
> > PREPARE record, and
> > 2) after the altering and restarting, start_decoding_at becomes same as
> > confirmed_flush and records behind this won't be decoded.
> >
>
> I don't understand the exact problem you are facing. IIUC, if the
> commit is after start_decoding_at point and prepare was before it, we
> expect to send the entire transaction followed by a commit record. The
> restart_lsn should be before the start of such a transaction and we
> should have recorded the changes in the reorder buffer.

This behavior is right for two_phase = false case. But if the parameter is
altered between PREPARE and COMMIT PREPARED, there is a possibility that only
COMMIT PREPARED is sent. As the first place, the executed workload is below.

1. created a subscription with (two_phase = false)
2. prepared a transaction on publisher
3. disabled the subscription once
4. altered the subscription to two_phase = true
5. enabled the subscription again
6. did COMMIT PREPARED on the publisher

-> Apply worker would raise an ERROR while applying COMMIT PREPARED record:
ERROR: prepared transaction with identifier "pg_gid_XXX_YYY" does not exist

Below part describes why the ERROR occurred.

======

### Regarding 1) the confirmed_flush can be ahead of PREPARE record,

If two_phase is off, as you might know, confirmed_flush can be ahead of PREPARE
record by keepalive mechanism.

Walsender sometimes sends a keepalive message in WalSndKeepalive(). Here the LSN
is written, which is lastly decoded record. Since the PREPARE record is skipped
(just handled by ReorderBufferProcessXid()), sometimes the written LSN in the
message can be ahead of PREPARE record. If the WAL records are aligned like below,
the LSN can point CHECKPOINT_ONLINE.

...
INSERT
PREPARE txn1
CHECKPOINT_ONLINE
...

On worker side, when it receives the keepalive, it compares the LSN in the
message and lastly received LSN, and advance last_received. Then, the worker replies
to the walsender, and at that time it replies that last_recevied record has been
flushed on the subscriber. See send_feedback().

On publisher, when the walsender receives the message from subscriber, it reads
the message and advance the confirmed_flush to the written value. If the walsender
sends LSN which locates ahead PREPARE, the confirmed flush is updated as well.

### Regarding 2) after the altering, records behind the confirmed_flush are not decoded

Then, at decoding phase. The snapshot builder determines the point where decoding
is resumed, as start_decoding_at. After the restart, the value is same as
confirmed_flush of the slot. Since the confiremed_fluish is ahead of PREPARE,
the start_decoding_at becomes ahead as well, so whole of prepared transactions
are not decoded.

======

Attached zip file contains the PoC and used script. You can refer what I really did.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Attachment Content-Type Size
alter_subscription_patches.zip application/x-zip-compressed 13.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2024-04-16 02:29:05 Re: Differential code coverage between 16 and HEAD
Previous Message David Steele 2024-04-16 02:12:10 Re: pg_combinebackup fails on file named INCREMENTAL.*