RE: Synchronizing slots from primary to standby

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: RE: Synchronizing slots from primary to standby
Date: 2023-12-20 12:42:28
Message-ID: OS0PR01MB57166472DB3D01F7407259179496A@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, December 20, 2023 4:03 PM Kuroda, Hayato/黒田 隼人 <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:

Hi,

>
> Dear Amit, Shveta,
>
> > > walsender.c
> > >
> > > 01. WalSndWaitForStandbyConfirmation
> > >
> > > ```
> > > + sleeptime =
> WalSndComputeSleeptime(GetCurrentTimestamp());
> > > ```
> > >
> > > It works well, but I'm not sure whether we should use
> > WalSndComputeSleeptime()
> > > because the function won't be called by walsender.
> > >
> >
> > I don't think it is correct to use this function because it is
> > walsender specific, for example, it uses 'last_reply_timestamp' which
> > won't be even initialized in the backend environment. We need to
> > probably use a different logic for sleep here or need to use a
> > hard-coded value.
>
> Oh, you are right. I haven't look until the func.
>
> > I think we should change the name of functions like
> > WalSndWaitForStandbyConfirmation() as they are no longer used by
> > walsender. IIRC, earlier, we had a common logic to wait from both
> > walsender and SQL APIs which led to this naming but that is no longer
> > true with the latest patch.
>
> How about "WaitForStandbyConfirmation", which is simpler? There are some
> functions like "WaitForParallelWorkersToFinish", "WaitForProcSignalBarrier"
> and so on.

Thanks for the comments. I think WaitForStandbyConfirmation is OK.
And I removed the WalSnd prefix for these functions and move them to
slot.c where the standby_slot_names is declared.

>
> > > 02.WalSndWaitForStandbyConfirmation
> > >
> > > ```
> > > + ConditionVariableTimedSleep(&WalSndCtl->wal_confirm_rcv_cv,
> > sleeptime,
> > > +
> > WAIT_EVENT_WAL_SENDER_WAIT_FOR_STANDBY_CONFIRMATION)
> > > ```
> > >
> > > Hmm, is it OK to use the same event as WalSndWaitForWal()? IIUC it
> > > should be
> > avoided.
> > >
> >
> > Agreed. So, how about using
> > WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION
> > so that we can use it both from the backend and walsender?
>
> Seems right. Note again that a description of .txt file must be also fixed.

Changed.

>
> Anyway, further comments on v50-0001.
>
> ~~~~~
> protocol.sgml
>
> 01. create_replication_slot
>
> ```
> + <varlistentry>
> + <term><literal>FAILOVER { 'true' | 'false' }</literal></term>
> + <listitem>
> + <para>
> + If true, the slot is enabled to be synced to the physical
> + standbys so that logical replication can be resumed after failover.
> + </para>
> + </listitem>
> + </varlistentry>
> ```
>
> IIUC, the true/false is optional. libpqwalreceiver does not add the boolean.
> Also you can follow the notation of `TWO_PHASE`.

Changed.

>
> 02. alter_replication_slot
>
> ```
> + <variablelist>
> + <varlistentry>
> + <term><literal>FAILOVER { 'true' | 'false' }</literal></term>
> + <listitem>
> + <para>
> + If true, the slot is enabled to be synced to the physical
> + standbys so that logical replication can be resumed after failover.
> + </para>
> + </listitem>
> + </varlistentry>
> + </variablelist>
> ```
>
> Apart from above, this boolean is mandatory, right?
> But you can follow other notation.
>

Right, changed it to optional to be consistent with others.

>
> ~~~~~~~
> slot.c
>
> 03. validate_standby_slots
>
> ```
> + /* Need a modifiable copy of string. */
> ...
> + /* Verify syntax and parse string into a list of identifiers. */
> ```
>
> Unnecessary comma?

You mean comma or period ? I think the current style is OK.

>
> 04. validate_standby_slots
>
> ```
> + if (!ok || !ReplicationSlotCtl)
> + {
> + pfree(rawname);
> + list_free(elemlist);
> + return ok;
> + }
> ```
>
> It may be more efficient to exit earlier when ReplicationSlotCtl is NULL.

I think even if ReplicationSlotCtl is NULL, we still need to check the syntax
of the slot names.

>
> ~~~~~~~
> walsender.c
>
> 05. PhysicalWakeupLogicalWalSnd
>
> ```
> +/*
> + * Wake up the logical walsender processes with failover-enabled slots
> +if the
> + * physical slot of the current walsender is specified in
> +standby_slot_names
> + * GUC.
> + */
> +void
> +PhysicalWakeupLogicalWalSnd(void)
> ```
>
> The function can be called from backend processes, but you said "the current
> walsender"
> in the comment.

Changed the words.

>
> 06. WalSndRereadConfigAndReInitSlotList
>
> ```
> + char *pre_standby_slot_names;
> +
> + ProcessConfigFile(PGC_SIGHUP);
> +
> + /*
> + * If we are running on a standby, there is no need to reload
> + * standby_slot_names since we do not support syncing slots to
> cascading
> + * standbys.
> + */
> + if (RecoveryInProgress())
> + return;
> +
> + pre_standby_slot_names = pstrdup(standby_slot_names);
> ```
>
> I felt that we must preserve pre_standby_slot_names before calling
> ProcessConfigFile().
>

Good catch. Fixed.

>
> 07. WalSndFilterStandbySlots
>
> I felt the prefix "WalSnd" may not be needed because both backend processes
> and walsender will call the function.

Right, renamed.

Attach the V51 patch set which addressed Kuroda-san's comments.
I also tried to improve the test in 0003 to make it stable.

Best Regards,
Hou zj

Attachment Content-Type Size
v51-0003-Additional-test-to-validate-the-restart_lsn-of-s.patch application/octet-stream 3.6 KB
v51-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch application/octet-stream 147.2 KB
v51-0002-Add-logical-slot-sync-capability-to-the-physical.patch application/octet-stream 89.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2023-12-20 12:43:55 RE: Synchronizing slots from primary to standby
Previous Message Masahiko Sawada 2023-12-20 12:26:36 Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)