Re: Synchronizing slots from primary to standby

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-12-11 09:02:45
Message-ID: CAJpy0uBEWO_8vxJFss0CaAw28QgQtKHh_bqs9iZS3YoyOe4Wyg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 11, 2023 at 2:20 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Mon, Dec 11, 2023 at 1:47 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Fri, Dec 8, 2023 at 2:36 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Dec 6, 2023 at 4:53 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > > >
> > > > PFA v43, changes are:
> > > >
> > >
> > > I wanted to discuss 0003 patch about cascading standby's. It is not
> > > clear to me whether we want to allow physical standbys to further wait
> > > for cascading standby to sync their slots. If we allow such a feature
> > > one may expect even primary to wait for all the cascading standby's
> > > because otherwise still logical subscriber can be ahead of one of the
> > > cascading standby. I feel even if we want to allow such a behaviour we
> > > can do it later once the main feature is committed. I think it would
> > > be good to just allow logical walsenders on primary to wait for
> > > physical standbys represented by GUC 'standby_slot_names'. If we agree
> > > on that then it would be good to prohibit setting this GUC on standby
> > > or at least it should be a no-op even if this GUC should be set on
> > > physical standby.
> > >
> > > Thoughts?
> >
> > IMHO, why not keep the behavior consistent across primary and standby?
> > I mean if it doesn't require a lot of new code/design addition then
> > it should be the user's responsibility. I mean if the user has set
> > 'standby_slot_names' on standby then let standby also wait for
> > cascading standby to sync their slots? Is there any issue with that
> > behavior?
> >
>
> Without waiting for cascading standby on primary, it won't be helpful
> to just wait on standby.
>
> Currently logical walsenders on primary waits for physical standbys to
> take changes before they update their own logical slots. But they wait
> only for their immediate standbys and not for cascading standbys.
> Although, on first standby, we do have logic where slot-sync workers
> wait for cascading standbys before they update their own slots (synced
> ones, see patch3). But this does not guarantee that logical
> subscribers on primary will never be ahead of the cascading standbys.
> Let us consider this timeline:
>
> t1: logical walsender on primary waiting for standby1 (first standby).
> t2: physical walsender on standby1 is stuck and thus there is delay in
> sending these changes to standby2 (cascading standby).
> t3: standby1 has taken changes and sends confirmation to primary.
> t4: logical walsender on primary receives confirmation from standby1
> and updates slot, logical subscribers of primary also receives the
> changes.
> t5: standby2 has not received changes yet as physical walsender on
> standby1 is still stuck, slotsync worker still waiting for standby2
> (cascading) before it updates its own slots (synced ones).
> t6: standby2 is promoted to become primary.
>
> Now we are in a state wherein primary, logical subscriber and first
> standby has some changes but cascading standby does not. And logical
> slots on primary were updated w/o confirming if cascading standby has
> taken changes or not. This is a problem and we do not have a simple
> solution for this yet.
>
> thanks
> Shveta

PFA v45, changes in patch002:

--Addressed comments in [1] and [2]
--Added holistic test case for patch02. Thanks Nisha for the test
implementation.

[1]: https://www.postgresql.org/message-id/CAHut%2BPuuqEpDse5msENsVuK3rjTRN-QGS67rRCGVv%2BzcT-f0GA%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CAA4eK1KbhdjKqui%3Dfr4Ny2TwGAFU9WLWTdypN%2BWG0WEfnBR%3D4w%40mail.gmail.com

thanks
Shveta

Attachment Content-Type Size
v45-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch application/octet-stream 145.9 KB
v45-0003-Allow-slot-sync-worker-to-wait-for-the-cascading.patch application/octet-stream 7.9 KB
v45-0002-Add-logical-slot-sync-capability-to-the-physical.patch application/octet-stream 91.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2023-12-11 09:11:26 Re: Synchronizing slots from primary to standby
Previous Message shveta malik 2023-12-11 08:50:51 Re: Synchronizing slots from primary to standby