RE: Fix slot synchronization with two_phase decoding enabled

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: RE: Fix slot synchronization with two_phase decoding enabled
Date: 2025-05-09 02:13:01
Message-ID: TYAPR01MB5724D116E52E89933A91D057948AA@TYAPR01MB5724.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 8, 2025 at 6:04 PM Zhijie Hou (Fujitsu) wrote:
>
> On Tue, May 6, 2025 at 7:22 PM Zhijie Hou (Fujitsu) wrote:
>
> >
> > On Mon, May 5, 2025 at 6:59 PM Amit Kapila wrote:
> > >
> > > On Sun, May 4, 2025 at 2:33 PM Masahiko Sawada
> > <sawada(dot)mshk(at)gmail(dot)com>
> > > wrote:
> > > >
> > > > While I cannot be entirely certain of my analysis, I believe the
> > > > root cause might be related to the backward movement of the
> > > > confirmed_flush LSN. The following scenario seems possible:
> > > >
> > > > 1. The walsender enables the two_phase and sets two_phase_at
> > > > (which should be the same as confirmed_flush).
> > > > 2. The slot's confirmed_flush regresses for some reason.
> > > > 3. The slotsync worker retrieves the remote slot information and
> > > > enables two_phase for the local slot.
> > > >
> > >
> > > Yes, this is possible. Here is my theory as to how it can happen
> > > in the current case. In the failed test, after the primary has
> > > prepared a transaction, the transaction won't be replicated to the
> > > subscriber as two_phase was not enabled for the slot. However,
> > > subsequent keepalive messages can send the latest WAL location to
> > > the subscriber and get the confirmation of the same from the
> > > subscriber without its origin being moved. Now, after we restart
> > > the apply worker (due to disable/enable for a subscription), it
> > > will use the previous origin_lsn to temporarily move back the
> > > confirmed flush LSN as explained in one of the previous emails in another thread [1].
> > > During this temporary movement of confirm flush LSN, the slotsync
> > > worker fetches the two_phase_at and confirm_flush_lsn values,
> > > leading to the assertion failure. We see this issue intermittently
> > > because it depends on the
> > timing of slotsync worker's request to fetch the slot's value.
> >
> > Based on this theory, I can reproduce the BF failure in the 040
> > tap-test on HEAD after applying the 0001 patch. This is achieved by
> > using the injection point to stop the walsender from sending a
> > keepalive before receiving the old origin position from the apply
> > worker, ensuring the confirmed_flush consistently moves backward
> > before
> slotsync.
> >
> > Additionally, I've reproduced the duplicate data issue on HEAD
> > without slotsync using the attached script (after applying the injection point patch).
> > This issue arises if we immediately disable the subscription after
> > the confirm_flush_lsn moves backward, preventing the walsender from
> > advancing the confirm_flush_lsn.
> >
> > In this case, if a prepared transaction exists before two_phase_at,
> > then after re-enabling the subscription, it will replicate that
> > prepared transaction when decoding the PREPARE record and replicate
> > that again when decoding the COMMIT PREPARED record. In such cases,
> > the apply worker keeps reporting the error:
> >
> > ERROR: transaction identifier "pg_gid_16387_755" is already in use.
> >
> > Apart from above, we're investigating whether the same issue can
> > occur in back-branches and will share the results once ready.
>
> I reproduced the duplicate data issue on PG17 as well using the
> attached shell script. Since PG17 doesn’t allow altering the twophase
> option, I created a subscription with two_phase=on and copy_data=on. I
> prepared a transaction before the table synchronization was ready, at
> a time when the slot's two_phase hadn't been set to true. This setup
> can cause in the prepared transaction being replicated twice after
> restarting the apply worker and the confirmed_flush_lsn move backwards.
>
> To ensure the origin position is initialized during table sync, I
> inserted some data before the prepared transaction. I added injection
> points(0001) to manage the table sync worker's process, allowing the
> apply worker to replicate some changes and update the origin position
> while table sync was ongoing.

The above reproduction of the issue indicates that it has been present since at
least PG15, when the twophase subscription option was introduced. I am
currently investigating whether the issue occurs without the twophase option.
If it does, the fix will need to be applied to all supported branches. I will
share the results once they are available.

Best Regards,
Hou zj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2025-05-09 02:22:27 Re: AIO v2.5
Previous Message Tom Lane 2025-05-09 02:04:06 Why our Valgrind reports suck