Re: Synchronizing slots from primary to standby

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-12-06 06:18:47
Message-ID: CAJpy0uCAzkua8KAQaLNnYKOJ56x3yJ9kRfDxL8Enp5Li8bzhdQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 6, 2023 at 10:56 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Dec 5, 2023 at 7:38 PM Drouvot, Bertrand
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >
> > On 12/5/23 12:32 PM, Amit Kapila wrote:
> > > On Tue, Dec 5, 2023 at 10:38 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > >>
> > >> On Mon, Dec 4, 2023 at 10:07 PM Drouvot, Bertrand
> > >> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> > >>>>
> > >>>
> > >>> Maybe another option could be to have the walreceiver a way to let the slot sync
> > >>> worker knows that it (the walreceiver) was not able to start due to non existing
> > >>> replication slot on the primary? (that way we'd avoid the slot sync worker having
> > >>> to talk to the primary).
> > >>
> > >> Few points:
> > >> 1) I think if we do it, we should do it in generic way i.e. slotsync
> > >> worker should go to no-op if walreceiver is not able to start due to
> > >> any reason and not only due to invalid primary_slot_name.
> > >> 2) Secondly, slotsync worker needs to make sure it has synced the
> > >> slots so far i.e. worker should not go to no-op immediately on seeing
> > >> missing WalRcv process if there are pending slots to be synced.
> > >>
> > >
> > > Won't it be better to just ping and check the validity of
> > > 'primary_slot_name' at the start of slot-sync and if it is changed
> > > anytime? I think it would be better to avoid adding dependency on
> > > walreciever state as that sounds like needless complexity.
> >
> > I think the overall extra complexity is linked to the fact that we first
> > want to ensure that the slots are in sync before shutting down the
> > sync slot worker.
> >
> > I think than talking to the primary or relying on the walreceiver state
> > is "just" what would trigger the decision to shutdown the sync slot worker.
> >
> > Relying on the walreceiver state looks better to me (as it avoids possibly
> > useless round trips with the primary).
> >
>
> But the round trip will only be once in the beginning and if the user
> changes the GUC primary-slot_name which shouldn't be that often.
>
> > Also the walreceiver could be down for multiple reasons, and I think there
> > is no point of having a sync slot worker running if the slots are in sync and
> > there is no walreceiver running (even if primary_slot_name is a valid one).
> >
>
> I feel that is indirectly relying on the fact that the primary won't
> advance logical slots unless physical standby has consumed data.

Yes, that is the basis of this discussion. But now on rethinking, if
the user has not set 'standby_slot_names' on primary at first pace,
then even if walreceiver on standby is down, slots on primary will
keep on advancing and thus we need to sync. We have no check currently
that mandates users to set standby_slot_names.

> Now,
> it is possible that slot-sync worker lags behind and still needs to
> sync more data for slots in which it makes sense for slot-sync worker
> to be alive. I think we can try to avoid checking walreceiver status
> till we can get more data to avoid the problem I mentioned but it
> doesn't sound like a clean way to achieve our purpose.
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sutou Kouhei 2023-12-06 06:19:08 Re: Make COPY format extendable: Extract COPY TO format implementations
Previous Message Peter Eisentraut 2023-12-06 06:18:26 Re: Remove MSVC scripts from the tree