Re: Synchronizing slots from primary to standby

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2024-02-07 04:00:13
Message-ID: CAHut+PvtysbVd8tj2AADk=eNo0VY9Ov9wkBP-K+9tj1wRS4M4w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some review comments for v79-0001

======
Commit message

1.
The logical replication slots on the primary can be synchronized to the hot
standby by enabling the "failover" parameter during
pg_create_logical_replication_slot() or by enabling "failover" option of the
CREATE SUBSCRIPTION command and calling pg_sync_replication_slots() function
on the standby.

~

SUGGESTION
The logical replication slots on the primary can be synchronized to
the hot standby by enabling failover during slot creation (e.g. using
the "failover" parameter of pg_create_logical_replication_slot(), or
using the "failover" option of the CREATE SUBSCRIPTION command), and
then calling pg_sync_replication_slots() function on the standby.

======

2.
+ <caution>
+ <para>
+ If after executing the function, hot_standby_feedback is disabled on
+ the standby or the physical slot configured in primary_slot_name is
+ removed, then it is possible that the necessary rows of the
+ synchronized slot will be removed by the VACUUM process on
the primary
+ server, resulting in the synchronized slot becoming invalidated.
+ </para>
+ </caution>

2a.
/If after/If, after/

~

2b.
Use SGML <variable> for the GUC names (hot_standby_feedback, and
primary_slot_name), and consider putting links for them as well.

======
src/sgml/logicaldecoding.sgml

3.
+ <sect2 id="logicaldecoding-replication-slots-synchronization">
+ <title>Replication Slot Synchronization</title>
+ <para>
+ A logical replication slot on the primary can be synchronized to the hot
+ standby by enabling the <literal>failover</literal> option for the slot
+ and calling <function>pg_sync_replication_slots</function>
+ on the standby. The <literal>failover</literal> option of the slot
+ can be enabled either by enabling
+ <link linkend="sql-createsubscription-params-with-failover">
+ <literal>failover</literal></link>
+ option during subscription creation or by providing
<literal>failover</literal>
+ parameter during
+ <link linkend="pg-create-logical-replication-slot">
+ <function>pg_create_logical_replication_slot</function></link>.

IMO it will be better to slightly reword this (like was suggested for
the Commit Message). I felt it is also better to refer/link to "CREATE
SUBSCRIPTION" instead of saying "during subscription creation".

SUGGESTION
The logical replication slots on the primary can be synchronized to
the hot standby by enabling failover during slot creation (e.g. using
the "failover" parameter of pg_create_logical_replication_slot, or
using the "failover" option of the CREATE SUBSCRIPTION command), and
then calling pg_sync_replication_slots() function on the standby.

~~~

4.
+ There are chances that the old primary is up again during the promotion
+ and if subscriptions are not disabled, the logical subscribers may keep
+ on receiving the data from the old primary server even after promotion
+ until the connection string is altered. This may result in the data
+ inconsistency issues and thus the logical subscribers may not be able
+ to continue the replication from the new primary server.
+ </para>

4a.
/There are chances/There is a chance/

/may keep on receiving the data/may continue to receive data/

~

4b.
BEFORE
This may result in the data inconsistency issues and thus the logical
subscribers may not be able to continue the replication from the new
primary server.

SUGGESTION
This might result in data inconsistency issues, preventing the logical
subscribers from being able to continue replication from the new
primary server.

~

4c.
I felt this whole part "There is a chance..." should be rendered as a
<note> or a <caution> or something.

======
src/backend/replication/logical/slotsync.c

5.
+/*
+ * Return true if all necessary GUCs for slot synchronization are set
+ * appropriately, otherwise return false.
+ */
+bool
+ValidateSlotSyncParams(void)
+{
+ char *dbname;
+
+ /*
+ * A physical replication slot(primary_slot_name) is required on the
+ * primary to ensure that the rows needed by the standby are not removed
+ * after restarting, so that the synchronized slot on the standby will not
+ * be invalidated.
+ */
+ if (PrimarySlotName == NULL || *PrimarySlotName == '\0')
+ {
+ ereport(ERROR,
+ /* translator: %s is a GUC variable name */
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("\"%s\" must be defined.", "primary_slot_name"));
+ return false;
+ }
+
+ /*
+ * hot_standby_feedback must be enabled to cooperate with the physical
+ * replication slot, which allows informing the primary about the xmin and
+ * catalog_xmin values on the standby.
+ */
+ if (!hot_standby_feedback)
+ {
+ ereport(ERROR,
+ /* translator: %s is a GUC variable name */
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("\"%s\" must be enabled.", "hot_standby_feedback"));
+ return false;
+ }
+
+ /*
+ * Logical decoding requires wal_level >= logical and we currently only
+ * synchronize logical slots.
+ */
+ if (wal_level < WAL_LEVEL_LOGICAL)
+ {
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("\"wal_level\" must be >= logical."));
+ return false;
+ }
+
+ /*
+ * The primary_conninfo is required to make connection to primary for
+ * getting slots information.
+ */
+ if (PrimaryConnInfo == NULL || *PrimaryConnInfo == '\0')
+ {
+ ereport(ERROR,
+ /* translator: %s is a GUC variable name */
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("\"%s\" must be defined.", "primary_conninfo"));
+ return false;
+ }
+
+ /*
+ * The slot synchronization needs a database connection for walrcv_exec to
+ * work.
+ */
+ dbname = walrcv_get_dbname_from_conninfo(PrimaryConnInfo);
+ if (dbname == NULL)
+ {
+ ereport(ERROR,
+
+ /*
+ * translator: 'dbname' is a specific option; %s is a GUC variable
+ * name
+ */
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("'dbname' must be specified in \"%s\".", "primary_conninfo"));
+ return false;
+ }
+
+ return true;
+}

The code of this function has been flip-flopping between versions.
Now, it is always giving an ERROR when something is wrong, so all of
the "return false" are unreachable. It also means the function comment
is wrong, and the boolean return is unused/unnecessary.

~~~

6. SlotSyncShmemInit

+/*
+ * Allocate and initialize slot sync shared memory.
+ */

This comment should use the same style wording as the other nearby
shmem function comments.

SUGGESTION
Allocate and initialize the shared memory of slot synchronization.

~~~

7.
+/*
+ * Cleanup the shared memory of slot synchronization.
+ */
+static void
+SlotSyncShmemExit(int code, Datum arg)

Since this is static, should it use the snake case naming convention?
-- e.g. slot_sync_shmem_exit.

~~~

8.
+/*
+ * Register the callback function to clean up the shared memory of slot
+ * synchronization.
+ */
+void
+SlotSyncInitialize(void)
+{
+ before_shmem_exit(SlotSyncShmemExit, 0);
+}

This is only doing registration for cleanup of shmem stuff. So, does
it really need it to be a separate function, or can this be registered
within SlotSyncShmemInit() itself?

~~~

9. SyncReplicationSlots

+ PG_TRY();
+ {
+ validate_primary_slot_name(wrconn);
+
+ (void) synchronize_slots(wrconn);
+ }
+ PG_FINALLY();
+ {
+ if (syncing_slots)
+ {
+ SpinLockAcquire(&SlotSyncCtx->mutex);
+ SlotSyncCtx->syncing = false;
+ SpinLockRelease(&SlotSyncCtx->mutex);
+
+ syncing_slots = false;
+ }
+
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();

IIUC, the "if (syncing_slots)" part is not really for normal
operation, but it is a safe-guard for cleaning up if some unexpected
ERROR happens. Maybe there should be a comment to say that.

======
src/test/recovery/t/040_standby_failover_slots_sync.pl

10.
+# Confirm that the logical failover slot is created on the standby and is
+# flagged as 'synced'
+is($standby1->safe_psql('postgres',
+ q{SELECT count(*) = 2 FROM pg_replication_slots WHERE slot_name IN
('lsub1_slot', 'lsub2_slot') AND synced;}),
+ "t",
+ 'logical slots have synced as true on standby');

/slot is created/slots are created/

/and is flagged/and are flagged/

======
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message torikoshia 2024-02-07 04:08:24 Re: RFC: Logging plan of the running query
Previous Message Ian Lawrence Barwick 2024-02-07 03:51:06 Re: Allow passing extra options to initdb for tests