Re: Synchronizing slots from primary to standby

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-11-03 11:31:43
Message-ID: CAA4eK1++T5K1SKE_=y8PPZa-VxkFrRPA8XjbttUCx9VpVXS-yw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 2, 2023 at 2:35 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> Here is the new version patch set(V29) which addressed Peter comments[1][2] and
> fixed one doc compile error.
>

Few comments:
==============
1.
+ <varlistentry id="sql-createsubscription-params-with-failover">
+ <term><literal>failover</literal> (<type>boolean</type>)</term>
+ <listitem>
+ <para>
+ Specifies whether the replication slot assocaited with the
subscription
+ is enabled to be synced to the physical standbys so that logical
+ replication can be resumed from the new primary after failover.
+ The default is <literal>true</literal>.

Why do you think it is a good idea to keep the default value as true?
I think the user needs to enable standby for syncing slots which is
not a default feature, so by default, the failover property should
also be false. AFAICS, it is false for create_slot SQL API as per the
below change; so that way also keeping default true for a subscription
doesn't make sense.
@@ -479,6 +479,7 @@ CREATE OR REPLACE FUNCTION
pg_create_logical_replication_slot(
IN slot_name name, IN plugin name,
IN temporary boolean DEFAULT false,
IN twophase boolean DEFAULT false,
+ IN failover boolean DEFAULT false,
OUT slot_name name, OUT lsn pg_lsn)

BTW, the below change indicates that the code treats default as false;
so, it seems to be a documentation error.
@@ -157,6 +158,8 @@ parse_subscription_options(ParseState *pstate,
List *stmt_options,
opts->runasowner = false;
if (IsSet(supported_opts, SUBOPT_ORIGIN))
opts->origin = pstrdup(LOGICALREP_ORIGIN_ANY);
+ if (IsSet(supported_opts, SUBOPT_FAILOVER))
+ opts->failover = false;

2.
-
/*
* Common option parsing function for CREATE and ALTER SUBSCRIPTION commands.
*

Spurious line removal.

3.
+ else if (opts.slot_name && failover_enabled)
+ {
+ walrcv_alter_slot(wrconn, opts.slot_name, opts.failover);
+ ereport(NOTICE,
+ (errmsg("altered replication slot \"%s\" on publisher",
+ opts.slot_name)));
+ }

I think we can add a comment to describe why it makes sense to enable
the failover property of the slot in this case. Can we change the
notice message to: "enabled failover for replication slot \"%s\" on
publisher"

4.
libpqrcv_create_slot(WalReceiverConn *conn, const char *slotname,
- bool temporary, bool two_phase, CRSSnapshotAction snapshot_action,
- XLogRecPtr *lsn)
+ bool temporary, bool two_phase, bool failover,
+ CRSSnapshotAction snapshot_action, XLogRecPtr *lsn)
{
PGresult *res;
StringInfoData cmd;
@@ -913,7 +917,14 @@ libpqrcv_create_slot(WalReceiverConn *conn, const
char *slotname,
else
appendStringInfoChar(&cmd, ' ');
}
-
+ if (failover)
+ {
+ appendStringInfoString(&cmd, "FAILOVER");
+ if (use_new_options_syntax)
+ appendStringInfoString(&cmd, ", ");
+ else
+ appendStringInfoChar(&cmd, ' ');
+ }

I don't see a corresponding change in repl_gram.y. I think the
following part of the code needs to be changed:
/* CREATE_REPLICATION_SLOT slot [TEMPORARY] LOGICAL plugin [options] */
| K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_LOGICAL IDENT
create_slot_options

You also need to update the docs for the same. See [1].

5.
@@ -228,6 +230,28 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo
fcinfo, bool confirm, bool bin
NameStr(MyReplicationSlot->data.plugin),
format_procedure(fcinfo->flinfo->fn_oid))));
..
+ if (XLogRecPtrIsInvalid(upto_lsn))
+ wal_to_wait = end_of_wal;
+ else
+ wal_to_wait = Min(upto_lsn, end_of_wal);
+
+ /* Initialize standby_slot_names_list */
+ SlotSyncInitConfig();
+
+ /*
+ * Wait for specified streaming replication standby servers (if any)
+ * to confirm receipt of WAL upto wal_to_wait.
+ */
+ WalSndWaitForStandbyConfirmation(wal_to_wait);
+
+ /*
+ * The memory context used to allocate standby_slot_names_list will be
+ * freed at the end of this call. So free and nullify the list in
+ * order to avoid usage of freed list in the next call to this
+ * function.
+ */
+ SlotSyncFreeConfig();

What if there is an error in WalSndWaitForStandbyConfirmation() before
calling SlotSyncFreeConfig()? I think the problem you are trying to
avoid by freeing it here can occur. I think it is better to do this in
a logical decoding context and free the list along with it as we are
doing in commit c7256e6564(see PG15). Also, it is better to allocate
this list somewhere in WalSndWaitForStandbyConfirmation(), probably in
WalSndGetStandbySlots, that will make the code look neat and also
avoid allocating this list when failover is not enabled for the slot.

6.
+/* ALTER_REPLICATION_SLOT slot */
+alter_replication_slot:
+ K_ALTER_REPLICATION_SLOT IDENT '(' generic_option_list ')'

I think you need to update the docs for this new command. See existing docs [1].

[1] - https://www.postgresql.org/docs/devel/protocol-replication.html

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nisha Moond 2023-11-03 11:32:44 Re: Intermittent failure with t/003_logical_slots.pl test on windows
Previous Message Matthias van de Meent 2023-11-03 11:16:05 Re: Popcount optimization using AVX512