Re: Synchronizing slots from primary to standby

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-12-14 05:38:22
Message-ID: CAA4eK1L2ts=gfiF4aw7-DH8HWj29s08hVRq-Ff8=mjfdUXx8CA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 13, 2023 at 3:53 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> 12.
> +static void
> +synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot,
> + bool *slot_updated)
> +{
> + ReplicationSlot *s;
> + ReplicationSlot *slot;
> + char sync_state = '\0';
>
> In my previous review [1]#33a I thought it was strange to assign the
> sync_state (which is essentially an enum) to some meaningless value,
> so I suggested it should be set to SYNCSLOT_STATE_NONE in the
> declaration. The reply [2] was "No, that will change the flow. It
> should stay uninitialized if the slot is not found."
>
> But I am not convinced there is any flow problem. Also,
> SYNCSLOT_STATE_NONE seems the naturally correct default for something
> with no state. It cannot be found and be SYNCSLOT_STATE_NONE at the
> same time (that is reported as an ERROR "skipping sync of slot") so I
> see no problem.
>
> The CURRENT code is like this:
>
> /* Slot created by the slot sync worker exists, sync it */
> if (sync_state)
> {
> Assert(sync_state == SYNCSLOT_STATE_READY || sync_state ==
> SYNCSLOT_STATE_INITIATED);
> ...
> }
> /* Otherwise create the slot first. */
> else
> {
> ...
> }
>
> AFAICT that could easily be changed to like below, with no change to
> the logic, and it avoids setting strange values.
>
> SUGGESTION.
>
> if (sync_state == SYNCSLOT_STATE_NONE)
> {
> /* Slot not found. Create it. */
> ..
> }
> else
> {
> Assert(sync_state == SYNCSLOT_STATE_READY || sync_state ==
> SYNCSLOT_STATE_INITIATED);
> ...
> }
>

I think instead of creating syncslot based on syncstate, it would be
better to create it when we don't find it via
SearchNamedReplicationSlot(). That will avoid the need to initialize
the syncstate and I think it would make code in this area look better.

> ~~~
>
> 13. synchronize_one_slot
>
> +static void
> +synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot,
> + bool *slot_updated)
>
> This *slot_updated parameter looks dubious. It is used in a loop from
> the caller to mean that ANY slot was updated -- e.g. maybe it is true
> or false on entry to this function.
>
> But, Instead of having some dependency between this function and the
> caller, IMO it makes more sense if we would make this just a boolean
> function in the first place (e.g. was updated? T/F)
>
> Then the caller can also be written more easily like:
>
> some_slot_updated |= synchronize_one_slot(wrconn, remote_slot);
>

+1.

>
> 23. slotsync_reread_config
>
> + old_dbname = walrcv_get_dbname_from_conninfo(PrimaryConnInfo);
> + Assert(old_dbname);
>
> (This is same comment as old review [1]#61)
>
> Hmm. I still don't see why this extraction of the dbname cannot be
> deferred until later when you know the PrimaryConnInfo has changed,
> otherwise, it might be redundant to do this. Shveta replied [2] that
> "Once PrimaryConnInfo is changed, we can not get old-dbname.", but I'm
> not so sure. Isn't this walrcv_get_dbname_from_conninfo just doing a
> string search -- Why can't you defer this until you know
> conninfoChanged is true, and then to get the old_dbname, you can just
> pass the old_primary_conninfo. E.g. call like
> walrcv_get_dbname_from_conninfo(old_primary_conninfo); Maybe I am
> mistaken.
>

I think we should just restart if any one of the information is
changed with a message like: "slotsync worker will restart because of
a parameter change". This would be similar to what we do apply worker
in maybe_reread_subscription().

>
> 28.
> + /*
> + * Is this a slot created by a sync-slot worker?
> + *
> + * Only relevant for logical slots on the physical standby.
> + */
> + char sync_state;
> +
>
> (probably I am repeating a previous thought here)
>
> The comment says the field is only relevant for standby, and that's
> how I've been visualizing it, and why I had previously suggested even
> renaming it to 'standby_sync_state'. However, replies are saying that
> after failover these sync_states also have "some meaning for the
> primary server".
>
> That's the part I have trouble understanding. IIUC the server states
> are just either all 'n' (means nothing) or 'r' because they are just
> leftover from the old standby state. So, does it *truly* have meaning
> for the server? Or should those states somehow be removed/ignored on
> the new primary? Anyway, the point is that if this field does have
> meaning also on the primary (I doubt) then those details should be in
> this comment. Otherwise "Only relevant ... on the standby" is too
> misleading.
>

I think this deserves more comments.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2023-12-14 05:53:02 RE: Synchronizing slots from primary to standby
Previous Message Dilip Kumar 2023-12-14 05:22:57 Re: logical decoding and replication of sequences, take 2