Re: Synchronizing slots from primary to standby

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-12-05 11:32:49
Message-ID: CAA4eK1+AumKenLjtVW2y4CpxBr_bo_AVZ67RWdDeJFt+Kgrj0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 5, 2023 at 10:38 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Mon, Dec 4, 2023 at 10:07 PM Drouvot, Bertrand
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >
> > >
> > >> ~~~
> > >> 4. primary_slot_name GUC value test:
> > >>
> > >> When standby is started with a non-existing primary_slot_name, the
> > >> wal-receiver gives an error but the slot-sync worker does not raise
> > >> any error/warning. It is no-op though as it has a check 'if
> > >> (XLogRecPtrIsInvalid(WalRcv->latestWalEnd)) do nothing'. Is this
> > >> okay or shall the slot-sync worker too raise an error and exit?
> > >>
> > >> In another case, when standby is started with valid primary_slot_name,
> > >> but it is changed to some invalid value in runtime, then walreceiver
> > >> starts giving error but the slot-sync worker keeps on running. In this
> > >> case, unlike the previous case, it even did not go to no-op mode (as
> > >> it sees valid WalRcv->latestWalEnd from the earlier run) and keep
> > >> pinging primary repeatedly for slots. Shall here it should error out
> > >> or at least be no-op until we give a valid primary_slot_name?
> > >>
> > >
> >
> > Nice catch, thanks!
> >
> > > I reviewed it. There is no way to test the existence/validity of
> > > 'primary_slot_name' on standby without making a connection to primary.
> > > If primary_slot_name is invalid from the start, slot-sync worker will
> > > be no-op (as you tested) as WalRecv->latestWalENd will be invalid, and
> > > if 'primary_slot_name' is changed to invalid on runtime, slot-sync
> > > worker will still keep on pinging primary. But that should be okay (in
> > > fact needed) as it needs to sync at-least the previous slot's
> > > positions (in case it is delayed in doing so for some reason earlier).
> > > And once the slots are up-to-date on standby, even if worker pings
> > > primary, it will not see any change in slots lsns and thus go for
> > > longer nap. I think, it is not worth the effort to introduce the
> > > complexity of checking validity of 'primary_slot_name' on primary from
> > > standby for this rare scenario.
> > >
> >
> > Maybe another option could be to have the walreceiver a way to let the slot sync
> > worker knows that it (the walreceiver) was not able to start due to non existing
> > replication slot on the primary? (that way we'd avoid the slot sync worker having
> > to talk to the primary).
>
> Few points:
> 1) I think if we do it, we should do it in generic way i.e. slotsync
> worker should go to no-op if walreceiver is not able to start due to
> any reason and not only due to invalid primary_slot_name.
> 2) Secondly, slotsync worker needs to make sure it has synced the
> slots so far i.e. worker should not go to no-op immediately on seeing
> missing WalRcv process if there are pending slots to be synced.
>

Won't it be better to just ping and check the validity of
'primary_slot_name' at the start of slot-sync and if it is changed
anytime? I think it would be better to avoid adding dependency on
walreciever state as that sounds like needless complexity.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2023-12-05 11:48:31 Re: undetected deadlock in ALTER SUBSCRIPTION ... REFRESH PUBLICATION
Previous Message Thomas Munro 2023-12-05 11:03:53 UBSan pointer overflow in xlogreader.c