Re: Synchronizing slots from primary to standby

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-10-06 16:48:25
Message-ID: CAA4eK1JVey4DRfSAEHfF1kgdfY4hbb1LEhCPGexKwYe2Sm1zVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 4, 2023 at 5:34 PM Drouvot, Bertrand
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> On 10/4/23 1:50 PM, shveta malik wrote:
> > On Wed, Oct 4, 2023 at 5:00 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >>
> >> On Wed, Oct 4, 2023 at 11:55 AM Drouvot, Bertrand
> >> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >>>
> >>> On 10/4/23 6:26 AM, shveta malik wrote:
> >>>> On Wed, Oct 4, 2023 at 5:36 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >>>>>
> >>>>>
> >>>>> How about an alternate scheme where we define sync_slot_names on
> >>>>> standby but then store the physical_slot_name in the corresponding
> >>>>> logical slot (ReplicationSlotPersistentData) to be synced? So, the
> >>>>> standby will send the list of 'sync_slot_names' and the primary will
> >>>>> add the physical standby's slot_name in each of the corresponding
> >>>>> sync_slot. Now, if we do this then even after restart, we should be
> >>>>> able to know for which physical slot each logical slot needs to wait.
> >>>>> We can even provide an SQL API to reset the value of
> >>>>> standby_slot_names in logical slots as a way to unblock decoding in
> >>>>> case of emergency (for example, corresponding when physical standby
> >>>>> never comes up).
> >>>>>
> >>>>
> >>>>
> >>>> Looks like a better approach to me. It solves most of the pain points like:
> >>>> 1) Avoids the need of multiple GUCs
> >>>> 2) Primary and standby need not to worry to be in sync if we maintain
> >>>> sync-slot-names GUC on both
> >>
> >> As per my understanding of this approach, we don't want
> >> 'sync-slot-names' to be set on the primary. Do you have a different
> >> understanding?
> >>
> >
> > Same understanding. We do not need it to be set on primary by user. It
> > will be GUC on standby and standby will convey it to primary.
>
> +1, same understanding here.
>

At PGConf NYC, I had a brief discussion on this topic with Andres
where yet another approach to achieve this came up. Have a parameter
like enable_failover at the slot level (this will be persistent
information). Users can set it during the create/alter subscription or
via pg_create_logical_replication_slot(). Also, on physical standby,
there will be a parameter like enable_syncslot. All the physical
standbys that have set enable_syncslot will receive all the logical
slots that are marked as enable_failover. To me, whether to sync a
particular slot is a slot-level property, so defining it in this new
way seems reasonable.

I think this will simplify the scheme a bit but still, the list of
physical standby's for which logical slots wait during decoding needs
to be maintained as we thought. But, how about with the above two
parameters (enable_failover and enable_syncslot), we have
standby_slot_names defined on the primary. That avoids the need to
store the list of standby_slot_names in logical slots and simplifies
the implementation quite a bit, right? Now, one can think if we have a
parameter like 'standby_slot_names' then why do we need
enable_syncslot on physical standby but that will be required to
invoke sync worker which will pull logical slot's information? The
advantage of having standby_slot_names defined on primary is that we
can selectively wait on the subset of physical standbys where we are
syncing the slots. I think this will be something similar to
'synchronous_standby_names' in the sense that the physical standbys
mentioned in standby_slot_names will behave as synchronous copies with
respect to slots and after failover user can switch to one of these
physical standby and others can start following new master/publisher.

Thoughts?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2023-10-06 16:49:05 Re: document the need to analyze partitioned tables
Previous Message Bruce Momjian 2023-10-06 16:20:30 Re: document the need to analyze partitioned tables