RE: Synchronizing slots from primary to standby

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: RE: Synchronizing slots from primary to standby
Date: 2024-03-05 07:20:15
Message-ID: OS0PR01MB5716AAC5E4C73CC7DAD0855294222@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, March 5, 2024 8:40 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Here are some review comments for v105-0001
>
> ==========
> doc/src/sgml/config.sgml
>
> 1.
> + <para>
> + The standbys corresponding to the physical replication slots in
> + <varname>standby_slot_names</varname> must configure
> + <literal>sync_replication_slots = true</literal> so they can receive
> + logical failover slots changes from the primary.
> + </para>
>
> /slots changes/slot changes/

Changed.

>
> ======
> doc/src/sgml/func.sgml
>
> 2.
> + The function may be waiting if the specified slot is a logical failover
> + slot and <link
> linkend="guc-standby-slot-names"><varname>standby_slot_names</varna
> me></link>
> + is configured.
>
> I know this has been through multiple versions already, but this
> latest wording "may be waiting..." doesn't seem very good to me.
>
> How about one of these?
>
> * The function may not be able to return immediately if the specified
> slot is a logical failover slot and standby_slot_names is configured.
>
> * The function return might be blocked if the specified slot is a
> logical failover slot and standby_slot_names is configured.
>
> * If the specified slot is a logical failover slot then the function
> will block until all physical slots specified in standby_slot_names
> have confirmed WAL receipt.
>
> * If the specified slot is a logical failover slot then the function
> will not return until all physical slots specified in
> standby_slot_names have confirmed WAL receipt.

I prefer the last one.

>
> ~~~
>
> 3.
> + slot may return to an earlier position. The function may be waiting if
> + the specified slot is a logical failover slot and
> + <link
> linkend="guc-standby-slot-names"><varname>standby_slot_names</varna
> me></link>
>
>
> Same as previous review comment #2

Changed.

>
> ======
> src/backend/replication/slot.c
>
> 4. WaitForStandbyConfirmation
>
> + * Used by logical decoding SQL functions that acquired logical failover slot.
>
> IIUC it doesn't work like that. pg_logical_slot_get_changes_guts()
> calls here unconditionally (i.e. the SQL functions don't even check if
> they are failover slots before calling this) so the comment seems
> misleading/redundant.

I removed the "acquired logical failover slot.".

>
> ======
> src/backend/replication/walsender.c
>
> 5. NeedToWaitForWal
>
> + /*
> + * Check if the standby slots have caught up to the flushed position. It
> + * is good to wait up to the flushed position and then let the WalSender
> + * send the changes to logical subscribers one by one which are already
> + * covered by the flushed position without needing to wait on every change
> + * for standby confirmation.
> + */
> + if (NeedToWaitForStandbys(flushed_lsn, wait_event))
> + return true;
> +
> + *wait_event = 0;
> + return false;
> +}
> +
>
> 5a.
> The comment (or part of it?) seems misplaced because it is talking
> WalSender sending changes, but that is not happening in this function.
>
> Also, isn't what this is saying already described by the other comment
> in the caller? e.g.:
>
> + /*
> + * Within the loop, we wait for the necessary WALs to be flushed to
> + * disk first, followed by waiting for standbys to catch up if there
> + * are enough WALs or upon receiving the shutdown signal. To avoid the
> + * scenario where standbys need to catch up to a newer WAL location in
> + * each iteration, we update our idea of the currently flushed
> + * position only if we are not waiting for standbys to catch up.
> + */
>

I moved these comments based on Shveta's suggestion.

> ~
>
> 5b.
> Most of the code is unnecessary. AFAICT all this is exactly same as just 1 line:
>
> return NeedToWaitForStandbys(flushed_lsn, wait_event);

Changed.

>
> ~~~
>
> 6. WalSndWaitForWal
>
> + /*
> + * Within the loop, we wait for the necessary WALs to be flushed to
> + * disk first, followed by waiting for standbys to catch up if there
> + * are enough WALs or upon receiving the shutdown signal. To avoid the
> + * scenario where standbys need to catch up to a newer WAL location in
> + * each iteration, we update our idea of the currently flushed
> + * position only if we are not waiting for standbys to catch up.
> + */
>
> Regarding that 1st sentence: maybe this logic used to be done
> explicitly "within the loop" but IIUC this logic is now hidden inside
> NeedToWaitForWal() so the comment should mention that.

Changed.

Best Regards,
Hou zj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2024-03-05 07:21:24 RE: Synchronizing slots from primary to standby
Previous Message Zhijie Hou (Fujitsu) 2024-03-05 07:15:29 RE: Synchronizing slots from primary to standby