Re: Synchronizing slots from primary to standby

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2024-03-29 09:05:22
Message-ID: CAA4eK1+hZbfEsWMoV48bJF_W5Xy2EAXb5X+S6SNp7C3ac4A_Ww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 29, 2024 at 1:08 PM Bertrand Drouvot
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> On Fri, Mar 29, 2024 at 07:23:11AM +0000, Zhijie Hou (Fujitsu) wrote:
> > On Friday, March 29, 2024 2:48 PM Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> > >
> > > Hi,
> > >
> > > On Fri, Mar 29, 2024 at 01:06:15AM +0000, Zhijie Hou (Fujitsu) wrote:
> > > > Attach a new version patch which fixed an un-initialized variable
> > > > issue and added some comments. Also, temporarily enable DEBUG2 for the
> > > > 040 tap-test so that we can analyze the possible CFbot failures easily.
> > > >
> > >
> > > Thanks!
> > >
> > > + if (remote_slot->confirmed_lsn != slot->data.confirmed_flush)
> > > + {
> > > + /*
> > > + * By advancing the restart_lsn, confirmed_lsn, and xmin using
> > > + * fast-forward logical decoding, we ensure that the required
> > > snapshots
> > > + * are saved to disk. This enables logical decoding to quickly
> > > reach a
> > > + * consistent point at the restart_lsn, eliminating the risk of
> > > missing
> > > + * data during snapshot creation.
> > > + */
> > > +
> > > pg_logical_replication_slot_advance(remote_slot->confirmed_lsn,
> > > +
> > > found_consistent_point);
> > > + ReplicationSlotsComputeRequiredLSN();
> > > + updated_lsn = true;
> > > + }
> > >
> > > Instead of using pg_logical_replication_slot_advance() for each synced slot and
> > > during sync cycles what about?:
> > >
> > > - keep sync slot synchronization as it is currently (not using
> > > pg_logical_replication_slot_advance())
> > > - create "an hidden" logical slot if sync slot feature is on
> > > - at the time of promotion use pg_logical_replication_slot_advance() on this
> > > hidden slot only to advance to the max lsn of the synced slots
> > >
> > > I'm not sure that would be enough, just asking your thoughts on this (benefits
> > > would be to avoid calling pg_logical_replication_slot_advance() on each sync
> > > slots and during the sync cycles).
> >
> > Thanks for the idea !
> >
> > I considered about this. I think advancing the "hidden" slot on promotion may be a
> > bit late, because if we cannot reach the consistent point after advancing the
> > "hidden" slot, then it means we may need to remove all the synced slots as we
> > are not sure if they are usable(will not loss data) after promotion.
>
> What about advancing the hidden slot during the sync cycles then?
>
> > The current approach is to mark such un-consistent slot as temp and persist
> > them once it reaches consistent point, so that user can ensure the slot can be
> > used after promotion once persisted.
>
> Right, but do we need to do so for all the sync slots? Would a single hidden
> slot be enough?
>

Even if we mark one of the synced slots as persistent without reaching
a consistent state, it could create a problem after promotion. And,
how a single hidden slot would serve the purpose, different synced
slots will have different restart/confirmed_flush LSN and we won't be
able to perform advancing for those using a single slot. For example,
say for first synced slot, it has not reached a consistent state and
then how can it try for the second slot? This sounds quite tricky to
make work. We should go with something simple where the chances of
introducing bugs are lesser.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2024-03-29 09:14:30 Remove excessive trailing semicolons
Previous Message Alexander Lakhin 2024-03-29 08:59:59 Re: To what extent should tests rely on VACUUM ANALYZE?