RE: Synchronizing slots from primary to standby

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: RE: Synchronizing slots from primary to standby
Date: 2023-11-06 01:30:58
Message-ID: OS0PR01MB57164C4C576A2BB8FEB61D6F94AAA@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, November 3, 2023 7:32 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>
> On Thu, Nov 2, 2023 at 2:35 PM Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com>
> wrote:
> >
> > Here is the new version patch set(V29) which addressed Peter
> > comments[1][2] and fixed one doc compile error.
> >
>
> Few comments:
> ==============
> 1.
> + <varlistentry id="sql-createsubscription-params-with-failover">
> + <term><literal>failover</literal> (<type>boolean</type>)</term>
> + <listitem>
> + <para>
> + Specifies whether the replication slot assocaited with the
> subscription
> + is enabled to be synced to the physical standbys so that logical
> + replication can be resumed from the new primary after failover.
> + The default is <literal>true</literal>.
>
> Why do you think it is a good idea to keep the default value as true?
> I think the user needs to enable standby for syncing slots which is not a default
> feature, so by default, the failover property should also be false. AFAICS, it is
> false for create_slot SQL API as per the below change; so that way also keeping
> default true for a subscription doesn't make sense.
> @@ -479,6 +479,7 @@ CREATE OR REPLACE FUNCTION
> pg_create_logical_replication_slot(
> IN slot_name name, IN plugin name,
> IN temporary boolean DEFAULT false,
> IN twophase boolean DEFAULT false,
> + IN failover boolean DEFAULT false,
> OUT slot_name name, OUT lsn pg_lsn)
>
> BTW, the below change indicates that the code treats default as false; so, it
> seems to be a documentation error.

I think the document is wrong and fixed it.

>
> 2.
> -
> /*
> * Common option parsing function for CREATE and ALTER SUBSCRIPTION
> commands.
> *
>
> Spurious line removal.
>
> 3.
> + else if (opts.slot_name && failover_enabled) {
> + walrcv_alter_slot(wrconn, opts.slot_name, opts.failover);
> + ereport(NOTICE, (errmsg("altered replication slot \"%s\" on
> + publisher", opts.slot_name))); }
>
> I think we can add a comment to describe why it makes sense to enable the
> failover property of the slot in this case. Can we change the notice message to:
> "enabled failover for replication slot \"%s\" on publisher"

Added.

>
> 4.
> libpqrcv_create_slot(WalReceiverConn *conn, const char *slotname,
> - bool temporary, bool two_phase, CRSSnapshotAction snapshot_action,
> - XLogRecPtr *lsn)
> + bool temporary, bool two_phase, bool failover, CRSSnapshotAction
> + snapshot_action, XLogRecPtr *lsn)
> {
> PGresult *res;
> StringInfoData cmd;
> @@ -913,7 +917,14 @@ libpqrcv_create_slot(WalReceiverConn *conn, const
> char *slotname,
> else
> appendStringInfoChar(&cmd, ' ');
> }
> -
> + if (failover)
> + {
> + appendStringInfoString(&cmd, "FAILOVER"); if (use_new_options_syntax)
> + appendStringInfoString(&cmd, ", "); else appendStringInfoChar(&cmd, '
> + '); }
>
> I don't see a corresponding change in repl_gram.y. I think the following part of
> the code needs to be changed:
> /* CREATE_REPLICATION_SLOT slot [TEMPORARY] LOGICAL plugin [options] */
> | K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_LOGICAL IDENT
> create_slot_options
>

I think after 0266e98, we started to use the new syntax(see the
generic_option_list rule) and we can avoid changing the repl_gram.y when adding
new options. The new failover can be detected when parsing the generic option
list(in parseCreateReplSlotOptions).

>
> 5.
> @@ -228,6 +230,28 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo
> fcinfo, bool confirm, bool bin
> NameStr(MyReplicationSlot->data.plugin),
> format_procedure(fcinfo->flinfo->fn_oid))));
> ..
> + if (XLogRecPtrIsInvalid(upto_lsn))
> + wal_to_wait = end_of_wal;
> + else
> + wal_to_wait = Min(upto_lsn, end_of_wal);
> +
> + /* Initialize standby_slot_names_list */ SlotSyncInitConfig();
> +
> + /*
> + * Wait for specified streaming replication standby servers (if any)
> + * to confirm receipt of WAL upto wal_to_wait.
> + */
> + WalSndWaitForStandbyConfirmation(wal_to_wait);
> +
> + /*
> + * The memory context used to allocate standby_slot_names_list will be
> + * freed at the end of this call. So free and nullify the list in
> + * order to avoid usage of freed list in the next call to this
> + * function.
> + */
> + SlotSyncFreeConfig();
>
> What if there is an error in WalSndWaitForStandbyConfirmation() before calling
> SlotSyncFreeConfig()? I think the problem you are trying to avoid by freeing it
> here can occur. I think it is better to do this in a logical decoding context and
> free the list along with it as we are doing in commit c7256e6564(see PG15).

I will analyze more about this case and update in next version.

> Also,
> it is better to allocate this list somewhere in
> WalSndWaitForStandbyConfirmation(), probably in WalSndGetStandbySlots,
> that will make the code look neat and also avoid allocating this list when
> failover is not enabled for the slot.

Changed as suggested.

>
> 6.
> +/* ALTER_REPLICATION_SLOT slot */
> +alter_replication_slot:
> + K_ALTER_REPLICATION_SLOT IDENT '(' generic_option_list ')'
>
> I think you need to update the docs for this new command. See existing docs
> [1].
>
> [1] - https://www.postgresql.org/docs/devel/protocol-replication.html

I think the doc for alter_replication_slot was added in V29.

Attach the V30 patch set which addressed above comments and fixed CFbot failures.

Best Regards,
Hou zj

Attachment Content-Type Size
v30-0003-Allow-slot-sync-workers-to-wait-for-the-cascadin.patch application/octet-stream 7.6 KB
v30-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch application/octet-stream 127.6 KB
v30-0002-Add-logical-slot-sync-capability-to-the-physical.patch application/octet-stream 116.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2023-11-06 02:20:49 Re: pg_upgrade and logical replication
Previous Message jian he 2023-11-06 00:00:00 Re: pg_column_toast_chunk_id: a function to get a chunk ID of a TOASTed value