RE: Synchronizing slots from primary to standby

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: RE: Synchronizing slots from primary to standby
Date: 2024-03-04 13:26:47
Message-ID: OS0PR01MB571691C615059F378E9B6D5D94232@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Monday, March 4, 2024 9:55 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Sun, Mar 3, 2024 at 6:51 PM Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com>
> wrote:
> >
> > On Sunday, March 3, 2024 7:47 AM Peter Smith <smithpb2250(at)gmail(dot)com>
> wrote:
> > >
>
> > > 3.
> > > + <note>
> > > + <para>
> > > + Value <literal>*</literal> is not accepted as it is inappropriate to
> > > + block logical replication for physical slots that either lack
> > > + associated standbys or have standbys associated that are
> > > + not
> > > enabled
> > > + for replication slot synchronization. (see
> > > + <xref
> > > linkend="logicaldecoding-replication-slots-synchronization"/>).
> > > + </para>
> > > + </note>
> > >
> > > Why does the document need to provide an excuse/reason for the rule?
> > > You could just say something like:
> > >
> > > SUGGESTION
> > > The slots must be named explicitly. For example, specifying wildcard
> > > values like <literal>*</literal> is not permitted.
> >
> > As suggested by Amit, I moved this to code comments.
>
> Was the total removal of this note deliberate? I only suggested removing the
> *reason* for the rule, not the entire rule. Otherwise, the user won't know to
> avoid doing this until they try it and get an error.

OK, Added.

>
>
> > >
> > > 9. NeedToWaitForWal
> > >
> > > + /*
> > > + * Check if the standby slots have caught up to the flushed position.
> > > + It
> > > + * is good to wait up to flushed position and then let it send the
> > > + changes
> > > + * to logical subscribers one by one which are already covered in
> > > + flushed
> > > + * position without needing to wait on every change for standby
> > > + * confirmation. Note that after receiving the shutdown signal, an
> > > + ERROR
> > > + * is reported if any slots are dropped, invalidated, or inactive.
> > > + This
> > > + * measure is taken to prevent the walsender from waiting indefinitely.
> > > + */
> > > + if (NeedToWaitForStandby(target_lsn, flushed_lsn, wait_event))
> > > + return true;
> > >
> > > I felt it was confusing things for this function to also call to the
> > > other one -- it seems an overlapping/muddling of the purpose of these.
> > > I think it will be easier to understand if the calling code just
> > > calls to one or both of these functions as required.
> >
> > Same as Amit, I didn't change this.
>
> AFAICT my previous review comment was misinterpreted. Please see [1] for
> more details.
>
> ~~~~
>
> Here are some more review comments for v104-00001

Thanks!

>
> ======
> Commit message
>
> 1.
> Additionally, The SQL functions pg_logical_slot_get_changes,
> pg_logical_slot_peek_changes and pg_replication_slot_advance are modified
> to wait for the replication slots specified in 'standby_slot_names' to catch up
> before returning.
>
> ~
>
> Maybe that should be expressed using similar wording as the docs...
>
> SUGGESTION
> Additionally, The SQL functions ... are modified. Now, when used with
> failover-enabled logical slots, these functions will block until all physical slots
> specified in 'standby_slot_names' have confirmed WAL receipt.

Changed.

>
> ======
> doc/src/sgml/config.sgml
>
> 2.
> + <function>pg_logical_slot_peek_changes</function></link>,
> + when used with failover enabled logical slots, will block until all
> + physical slots specified in
> <varname>standby_slot_names</varname> have
> + confirmed WAL receipt.
>
> /failover enabled logical slots/failover-enabled logical slots/

Changed. Note that for this comment and remaining comments,
I used the later version we agreed(logical failover slot).

>
> ======
> doc/src/sgml/func.sgml
>
> 3.
> + The function may be blocked if the specified slot is a failover enabled
> + slot and <link
> linkend="guc-standby-slot-names"><varname>standby_slot_names</varna
> me></link>
> + is configured.
> </para></entry>
>
> /a failover enabled slot//a failover-enabled slot/

Changed.

>
> ~~~
>
> 4.
> + slot may return to an earlier position. The function may be blocked if
> + the specified slot is a failover enabled slot and
> + <link
> linkend="guc-standby-slot-names"><varname>standby_slot_names</varna
> me></link>
> + is configured.
>
> /a failover enabled slot//a failover-enabled slot/

Changed.

>
> ======
> src/backend/replication/slot.c
>
> 5.
> +/*
> + * Wait for physical standbys to confirm receiving the given lsn.
> + *
> + * Used by logical decoding SQL functions that acquired failover enabled slot.
> + * It waits for physical standbys corresponding to the physical slots
> +specified
> + * in the standby_slot_names GUC.
> + */
> +void
> +WaitForStandbyConfirmation(XLogRecPtr wait_for_lsn)
>
> /failover enabled slot/failover-enabled slot/

Changed.

>
> ~~~
>
> 6.
> + /*
> + * Don't need to wait for the standby to catch up if the current
> + acquired
> + * slot is not a failover enabled slot, or there is no value in
> + * standby_slot_names.
> + */
>
> /failover enabled slot/failover-enabled slot/

Changed.

>
> ======
> src/backend/replication/slotfuncs.c
>
> 7.
> +
> + /*
> + * Wake up logical walsenders holding failover enabled slots after
> + * updating the restart_lsn of the physical slot.
> + */
> + PhysicalWakeupLogicalWalSnd();
>
> /failover enabled slots/failover-enabled slots/

Changed.

>
> ======
> src/backend/replication/walsender.c
>
> 8.
> +/*
> + * Wake up the logical walsender processes with failover enabled slots
> +if the
> + * currently acquired physical slot is specified in standby_slot_names GUC.
> + */
> +void
> +PhysicalWakeupLogicalWalSnd(void)
>
> /failover enabled slots/failover-enabled slots/

Changed.

>
> 9.
> +/*
> + * Returns true if not all standbys have caught up to the flushed
> +position
> + * (flushed_lsn) when the current acquired slot is a failover enabled
> +logical
> + * slot and we are streaming; otherwise, returns false.
> + *
> + * If returning true, the function sets the appropriate wait event in
> + * wait_event; otherwise, wait_event is set to 0.
> + */
> +static bool
> +NeedToWaitForStandby(XLogRecPtr target_lsn, XLogRecPtr flushed_lsn,
> + uint32 *wait_event)
>
> 9a.
> /failover enabled logical slot/failover-enabled logical slot/

Changed.

>
> ~
>
> 9b.
> Probably that function name should be plural.
>
> /NeedToWaitForStandby/NeedToWaitForStandbys/
>
> ~~~
>
> 10.
> +/*
> + * Returns true if we need to wait for WALs to be flushed to disk, or
> +if not
> + * all standbys have caught up to the flushed position (flushed_lsn)
> +when the
> + * current acquired slot is a failover enabled logical slot and we are
> + * streaming; otherwise, returns false.
> + *
> + * If returning true, the function sets the appropriate wait event in
> + * wait_event; otherwise, wait_event is set to 0.
> + */
> +static bool
> +NeedToWaitForWal(XLogRecPtr target_lsn, XLogRecPtr flushed_lsn,
> + uint32 *wait_event)
>
> /failover enabled logical slot/failover-enabled logical slot/

Changed.

>
> ~~~
>
> 11. WalSndWaitForWal
>
> + /*
> + * Within the loop, we wait for the necessary WALs to be flushed to
> + * disk first, followed by waiting for standbys to catch up if there
> + * are enought WALs or upon receiving the shutdown signal. To avoid
> + * the scenario where standbys need to catch up to a newer WAL
> + * location in each iteration, we update our idea of the currently
> + * flushed position only if we are not waiting for standbys to catch
> + * up.
> + */
>
> typo
>
> /enought/enough/

Fixed.

Best Regards,
Hou zj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2024-03-04 13:28:04 RE: Synchronizing slots from primary to standby
Previous Message Zhijie Hou (Fujitsu) 2024-03-04 13:26:13 RE: Synchronizing slots from primary to standby