Re: Synchronizing slots from primary to standby

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Synchronizing slots from primary to standby
Date: 2024-02-20 03:33:27
Message-ID: CAJpy0uCpwShc76zBen4rJsL4FOc3KxC8J8dEuW-KG=a31J6Nfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 20, 2024 at 8:25 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
>
> I've reviewed the v91 patch. Here are random comments:

Thanks for the comments.

> ---
> /*
> * Checks the remote server info.
> *
> - * We ensure that the 'primary_slot_name' exists on the remote server and the
> - * remote server is not a standby node.
> + * Check whether we are a cascading standby. For non-cascading standbys, it
> + * also ensures that the 'primary_slot_name' exists on the remote server.
> */
>
> IIUC what the validate_remote_info() does doesn't not change by this
> patch, so the previous comment seems to be clearer to me.
>
> ---
> if (remote_in_recovery)
> + {
> + /*
> + * If we are a cascading standby, no need to check further for
> + * 'primary_slot_name'.
> + */
> ereport(ERROR,
> errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> errmsg("cannot synchronize replication slots from a
> standby server"));
> + }
> + else
> + {
> + bool primary_slot_valid;
>
> - primary_slot_valid = DatumGetBool(slot_getattr(tupslot, 2, &isnull));
> - Assert(!isnull);
> + primary_slot_valid = DatumGetBool(slot_getattr(tupslot, 2, &isnull));
> + Assert(!isnull);
>
> - if (!primary_slot_valid)
> - ereport(ERROR,
> - errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> - errmsg("bad configuration for slot synchronization"),
> - /* translator: second %s is a GUC variable name */
> - errdetail("The replication slot \"%s\" specified by
> \"%s\" does not exist on the primary server.",
> - PrimarySlotName, "primary_slot_name"));
> + if (!primary_slot_valid)
> + ereport(ERROR,
> + errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("bad configuration for slot synchronization"),
> + /* translator: second %s is a GUC variable name */
> + errdetail("The replication slot \"%s\" specified
> by \"%s\" does not exist on the primary server.",
> + PrimarySlotName, "primary_slot_name"));
> + }
>
> I think it's a refactoring rather than changes required by the
> slotsync worker. We can do that in a separate patch but why do we need
> this change in the first place?

In v90, this refactoring was made due to the fact that
validate_remote_info() was supposed to behave differently for SQL
function and slot-sync worker. SQL-function was supposed to ERROR out
while the worker was supposed to LOG and become no-op. And thus the
change was needed. In v91, we made this functionality same i.e. both
sql function and worker will error out but missed to remove this
refactoring. Thanks for catching this, I will revert it in the next
version. To match the refactoring, I made the comment change too, will
revert that as well.

> ---
> + ValidateSlotSyncParams(ERROR);
> +
> /* Load the libpq-specific functions */
> load_file("libpqwalreceiver", false);
>
> - ValidateSlotSyncParams();
> + (void) CheckDbnameInConninfo();
>
> Is there any reason why we move where to check the parameters?

Earlier DBname verification was done inside ValidateSlotSyncParams()
and thus it was needed to load 'libpqwalreceiver' before we call this
function. Now we have moved dbname verification in a separate call and
thus the above order got changed. ValidateSlotSyncParams() is a common
function used by SQL function and worker. Earlier slot sync worker was
checking all the GUCs after starting up and was exiting each time any
GUC was invalid. It was suggested that it would be better to check the
GUCs before starting the slot sync worker in the postmaster itself,
making the ValidateSlotSyncParams() move to postmaster (see
SlotSyncWorkerAllowed). But it was not a good idea to load libpq in
postmaster and thus we moved libpq related verification out of
ValidateSlotSyncParams(). This resulted in the above change. I hope
it answers your query.

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-02-20 03:43:54 Re: Synchronizing slots from primary to standby
Previous Message Andrei Lepikhov 2024-02-20 03:18:23 Re: POC, WIP: OR-clause support for indexes