Re: Fix slot synchronization with two_phase decoding enabled

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fix slot synchronization with two_phase decoding enabled
Date: 2025-05-29 19:24:55
Message-ID: CAD21AoBLHyX3KSGohf2jU36Jj2MDz34ECGym02vyigOCTxKydA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 29, 2025 at 2:00 AM Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
>
> On Wed, May 28, 2025 at 6:10 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Sun, May 25, 2025 at 11:34 PM Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
> > >
> > > to
> > > On Fri, May 23, 2025 at 10:06 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > >
> > > > Here are review comments for v14 patch:
> > > >
> > >
> > > Thank you for the review.
> > >
> > > > I think we need to include a basic test case where we simply create a
> > > > subscription with two_phase=true and then enable the failover via
> > > > ALTER SUBSCRIPTION after making sure that slot's restart_lsn passed
> > > > two_phase_at. Probably if we use this case in 003_logical_slots.pl, we
> > > > can avoid creating regress_sub2?
> > > >
> > >
> > > A test has been added in 040_standby_failover_slots_sync.pl to enable
> > > failover via ALTER SUBSCRIPTION.
> >
> > Yes but the slot is originally created via SQL API, which seems
> > uncommon usage in practice. I thought it would be good to have the
> > basic steps in the tests to enable both fields.
> >
>
> Apologies for the earlier misunderstanding. I've updated
> 040_standby_failover_slots_sync.pl to modify the slot creation via the
> CREATE SUBSCRIPTION command as suggested.
>
> > > Regarding regress_sub2 in 003_logical_slots.pl :
> > > If we use ALTER SUBSCRIPTION to set failover=true on regress_sub, it
> > > leads to pg_upgrade failure, as it attempts to create a slot on the
> > > new_node(upgraded version) with both two_phase and failover enabled,
> > > which is an unsupported combination.
> >
> > I think that the pg_upgrade test should cover the case where it
> > restores logical slots with both fields enabled in the new cluster.
> > When I actually tried this case, I realized that pg_upgrade doesn't
> > handle this case; it tried to create the logical slot via SQL API but
> > it failed as we don't allow it to create a logical slot with enabling
> > both two_phase and failover. How can we handle this case?
> >
>
> On further analysis, it seems feasible and safe to allow creation of
> slot and subscription with both two_phase and failover enabled during
> upgrade (pg_upgrade).
> As before starting the upgrade, pg_upgrade ensures that -
> a) All slot changes have been fully consumed.
> b) No prepared transactions exist.
> Additionally, it is also documented[1] - "Obviously, no one should be
> accessing the clusters during the upgrade."
>
> Though, if allowed, the slot may be created with a restart_lsn <
> two_phase_at. IIUC, the guarantees above make it impossible for a
> prepared transaction to exist before two_phase_at, thus preventing the
> bug case.

True, but I think we need to cover the simple pg_dump and pg_restore
case too, without pg_upgrade. Otherwise the dump file won't work if
the database has at least one subscription with two_phase and failover
enabled.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matheus Alcantara 2025-05-29 20:49:57 Re: Fixing memory leaks in postgres_fdw
Previous Message David G. Johnston 2025-05-29 19:14:48 get speed help