Re: Synchronizing slots from primary to standby

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-10-13 08:35:34
Message-ID: CAJpy0uD6c3SB+UBG7ULTDydb=xUDfG4EAa6jkwaCVTZFxwrN+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 12, 2023 at 9:18 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Mon, Oct 9, 2023 at 10:51 AM Drouvot, Bertrand
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> > On 10/6/23 6:48 PM, Amit Kapila wrote:
> > > On Wed, Oct 4, 2023 at 5:34 PM Drouvot, Bertrand
> > > <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> > >>
> > >> On 10/4/23 1:50 PM, shveta malik wrote:
> > >>> On Wed, Oct 4, 2023 at 5:00 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >>>>
> > >>>> On Wed, Oct 4, 2023 at 11:55 AM Drouvot, Bertrand
> > >>>> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> > >>>>>
> > >>>>> On 10/4/23 6:26 AM, shveta malik wrote:
> > >>>>>> On Wed, Oct 4, 2023 at 5:36 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> How about an alternate scheme where we define sync_slot_names on
> > >>>>>>> standby but then store the physical_slot_name in the corresponding
> > >>>>>>> logical slot (ReplicationSlotPersistentData) to be synced? So, the
> > >>>>>>> standby will send the list of 'sync_slot_names' and the primary will
> > >>>>>>> add the physical standby's slot_name in each of the corresponding
> > >>>>>>> sync_slot. Now, if we do this then even after restart, we should be
> > >>>>>>> able to know for which physical slot each logical slot needs to wait.
> > >>>>>>> We can even provide an SQL API to reset the value of
> > >>>>>>> standby_slot_names in logical slots as a way to unblock decoding in
> > >>>>>>> case of emergency (for example, corresponding when physical standby
> > >>>>>>> never comes up).
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> Looks like a better approach to me. It solves most of the pain points like:
> > >>>>>> 1) Avoids the need of multiple GUCs
> > >>>>>> 2) Primary and standby need not to worry to be in sync if we maintain
> > >>>>>> sync-slot-names GUC on both
> > >>>>
> > >>>> As per my understanding of this approach, we don't want
> > >>>> 'sync-slot-names' to be set on the primary. Do you have a different
> > >>>> understanding?
> > >>>>
> > >>>
> > >>> Same understanding. We do not need it to be set on primary by user. It
> > >>> will be GUC on standby and standby will convey it to primary.
> > >>
> > >> +1, same understanding here.
> > >>
> > >
> > > At PGConf NYC, I had a brief discussion on this topic with Andres
> > > where yet another approach to achieve this came up.
> >
> > Great!
> >
> > > Have a parameter
> > > like enable_failover at the slot level (this will be persistent
> > > information). Users can set it during the create/alter subscription or
> > > via pg_create_logical_replication_slot(). Also, on physical standby,
> > > there will be a parameter like enable_syncslot. All the physical
> > > standbys that have set enable_syncslot will receive all the logical
> > > slots that are marked as enable_failover. To me, whether to sync a
> > > particular slot is a slot-level property, so defining it in this new
> > > way seems reasonable.
> >
> > Yeah, as this is a slot-level property, I agree that this seems reasonable.
> >
> > Also that sounds more natural to me with this approach. The primary
> > is really the one that "drives" which slots can be synced. I like it.
> >
> > One could also set enable_failover while creating a logical slot on a physical
> > standby (so that cascading standbys could also have "extra slot" to sync as
> > compare to "level 1" standbys).
> >
> > >
> > > I think this will simplify the scheme a bit but still, the list of
> > > physical standby's for which logical slots wait during decoding needs
> > > to be maintained as we thought.
> >
> > Right.
> >
> > > But, how about with the above two
> > > parameters (enable_failover and enable_syncslot), we have
> > > standby_slot_names defined on the primary. That avoids the need to
> > > store the list of standby_slot_names in logical slots and simplifies
> > > the implementation quite a bit, right?
> >
> > Agree.
> >
> > > Now, one can think if we have a
> > > parameter like 'standby_slot_names' then why do we need
> > > enable_syncslot on physical standby but that will be required to
> > > invoke sync worker which will pull logical slot's information?
> >
> > yes and enable_sync slot on the standby could also be used to "pause"
> > the sync on standbys (by disabling the parameter) if one would want to
> > (without the need to modify anything on the primary).
> >
> > > The
> > > advantage of having standby_slot_names defined on primary is that we
> > > can selectively wait on the subset of physical standbys where we are
> > > syncing the slots.
> >
> > Yeah and this flexibility/filtering looks somehow mandatory to me.
> >
> > > I think this will be something similar to
> > > 'synchronous_standby_names' in the sense that the physical standbys
> > > mentioned in standby_slot_names will behave as synchronous copies with
> > > respect to slots and after failover user can switch to one of these
> > > physical standby and others can start following new master/publisher.
> > >
> > > Thoughts?
> >
> > I like the idea and I think that's the one that seems the more reasonable
> > to me. I'd vote for this idea with:
> >
> > - standby_slot_names on the primary (could also be set on standbys in case of
> > cascading context)
> > - enable_failover at logical slot creation + API to enable/disable it at wish
> > - enable_syncslot on the standbys
> >
>
> Thank You Amit and Bertrand for feedback on the new design.
>
> PFA v23 patch set which attempts to implement the new proposed design
> to handle sync candidates:
> a) The synchronize_slot_names GUC is removed. Instead the
> 'enable_failover' property is added at the slot level which is
> persistent. It can be set by the user using create-subscription
> command. eg: create subscription mysub connection '....' publication
> mypub WITH (enable_failover = true);
> b) New GUC enable_syncslot is added on standbys to enable disable
> slot-sync on standbys
> c) standby_slot_names are maintained on primary.
>
> The patch 002 also addresses Peter's comments dated Oct 6 and Oct10.
>
> Thank You Ajin for implementing 'create subscription' cmd changes to
> support 'enable_failover' syntax.
>
> This patch has not implemented below yet, it will be done in next version:
> --Provide support to set/alter enable_failover using
> alter-subscription and pg_create_logical_replication_slot
> --Changes needed to support slot-synchronization on cascading standbys
> --Display "enable_failover" property in pg_replication_slots. I think
> it makes sense to do this.
>
> thanks
> Shveta

PFA v24 patch set which has below changes:

1) 'enable_failover' displayed in pg_replication_slots.
2) Support for 'enable_failover' in
pg_create_logical_replication_slot(). It is an optional argument with
default value false.
3) Addressed pending comments (1-30) from Peter in [1].
4) Fixed an issue in patch002 due to which even slots with
enable_failover=false were getting synced.

The changes for 1 and 2 are in patch001 while 3 and 4 are in patch0002

Thanks Ajin, for working on 1 and 3.

[1]: https://www.postgresql.org/message-id/CAHut%2BPtbb3Ydx40a0p7Qovvp-4cC4ZCDreGRjmFzou8mjh2PmA%40mail.gmail.com

Next to do:
--Support for enable_failover in alter-subscription.
--Support for slot-sync on cascading standbys.

thanks
Shveta

Attachment Content-Type Size
v24-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch application/octet-stream 51.5 KB
v24-0002-Add-logical-slot-sync-capability-to-physical-sta.patch application/octet-stream 109.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alena Rybakina 2023-10-13 08:39:41 Re: A new strategy for pull-up correlated ANY_SUBLINK
Previous Message Dilip Kumar 2023-10-13 08:33:23 Re: pg_upgrade's interaction with pg_resetwal seems confusing