Re: Synchronizing slots from primary to standby

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2024-01-09 01:17:27
Message-ID: CAHut+PvbbPz1=T4bzY0_GotUK460Eih41Twjt=czJ1z2J8SGEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some review comments for patch v57-0001.

======
doc/src/sgml/protocol.sgml

1. CREATE_REPLICATION_SLOT ... FAILOVER

+ <varlistentry>
+ <term><literal>FAILOVER [ <replaceable
class="parameter">boolean</replaceable> ]</literal></term>
+ <listitem>
+ <para>
+ If true, the slot is enabled to be synced to the physical
+ standbys so that logical replication can be resumed after failover.
+ </para>
+ </listitem>
+ </varlistentry>

This syntax says passing the boolean value is optional. So the default
needs to the specified here in the docs (like what the TWO_PHASE
option does).

~~~

2. ALTER_REPLICATION_SLOT ... FAILOVER

+ <variablelist>
+ <varlistentry>
+ <term><literal>FAILOVER [ <replaceable
class="parameter">boolean</replaceable> ]</literal></term>
+ <listitem>
+ <para>
+ If true, the slot is enabled to be synced to the physical
+ standbys so that logical replication can be resumed after failover.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>

This syntax says passing the boolean value is optional. So it needs to
be specified here in the docs that not passing a value would be the
same as passing the value true.

======
doc/src/sgml/ref/alter_subscription.sgml

3.
+ If <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
+ is enabled, you can temporarily disable it in order to execute
these commands.

/in order to/to/

~~~

4.
+ <para>
+ When altering the
+ <link linkend="sql-createsubscription-params-with-slot-name"><literal>slot_name</literal></link>,
+ the <literal>failover</literal> property of the new slot may
differ from the
+ <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
+ parameter specified in the subscription. When creating the slot,
+ ensure the slot failover property matches the
+ <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
+ parameter value of the subscription.
+ </para>

4a.
the <literal>failover</literal> property of the new slot may differ

Maybe it would be more clear if that said "the failover property value
of the named slot...".

~

4b.
In the "failover property matches" part should that failover also be
rendered as <literal> like before in the same paragraph?

======
doc/src/sgml/system-views.sgml

5.
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>failover</structfield> <type>bool</type>
+ </para>
+ <para>
+ True if this logical slot is enabled to be synced to the physical
+ standbys so that logical replication can be resumed from the new primary
+ after failover. Always false for physical slots.
+ </para></entry>
+ </row>

/True if this logical slot is enabled.../True if this is a logical
slot enabled.../

======
src/backend/commands/subscriptioncmds.c

6. CreateSubscription

+ /*
+ * Even if failover is set, don't create the slot with failover
+ * enabled. Will enable it once all the tables are synced and
+ * ready. The intention is that if failover happens at the time of
+ * table-sync, user should re-launch the subscription instead of
+ * relying on main slot (if synced) with no table-sync data
+ * present. When the subscription has no tables, leave failover as
+ * false to allow ALTER SUBSCRIPTION ... REFRESH PUBLICATION to
+ * work.
+ */
+ if (opts.failover && !opts.copy_data && tables != NIL)
+ failover_enabled = true;

AFAICT it might be possible for this to set failover_enabled = true if
copy_data is false. So failover_enabled would be true when later
calling:
walrcv_create_slot(wrconn, opts.slot_name, false, twophase_enabled,
failover_enabled, CRS_NOEXPORT_SNAPSHOT, NULL);

Isn't that contrary to what this comment said: "Even if failover is
set, don't create the slot with failover enabled"

~~~

7. AlterSubscription. case ALTER_SUBSCRIPTION_OPTIONS:

+ if (!sub->slotname)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot set failover for subscription that does not have slot name")));

/for subscription that does not have slot name/for a subscription that
does not have a slot name/

======
.../libpqwalreceiver/libpqwalreceiver.c

8.
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ ereport(ERROR,
+ (errcode(ERRCODE_PROTOCOL_VIOLATION),
+ errmsg("could not alter replication slot \"%s\"",
+ slotname)));

This used to display the error message like
pchomp(PQerrorMessage(conn->streamConn)) but it was removed. Is it OK?

======
src/backend/replication/logical/tablesync.c

9.
+ if (MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING)
+ ereport(LOG,
+ /* translator: %s is a subscription option */
+ (errmsg("logical replication apply worker for subscription \"%s\"
will restart so that %s can be enabled",
+ MySubscription->name, "two_phase")));
+
+ if (MySubscription->failoverstate == LOGICALREP_FAILOVER_STATE_PENDING)
+ ereport(LOG,
+ /* translator: %s is a subscription option */
+ (errmsg("logical replication apply worker for subscription \"%s\"
will restart so that %s can be enabled",
+ MySubscription->name, "failover")));

Those errors have multiple %s, so the translator's comment should say
"the 2nd %s is a..."

~~~

10.
void
-UpdateTwoPhaseState(Oid suboid, char new_state)
+EnableTwoPhaseFailoverTriState(Oid suboid, bool enable_twophase,
+ bool enable_failover)

I felt the function name was a bit confusing. Maybe it is simpler to
call it like "EnableTriState" or "EnableSubTriState" -- the parameters
anyway specify what actual state(s) will be set.

======
src/backend/replication/logical/worker.c

11.
+ /* Update twophase and/or failover */
+ EnableTwoPhaseFailoverTriState(MySubscription->oid, twophase_pending,
+ failover_pending);
+ if (twophase_pending)
+ MySubscription->twophasestate = LOGICALREP_TWOPHASE_STATE_ENABLED;
+
+ if (failover_pending)
+ MySubscription->failoverstate = LOGICALREP_FAILOVER_STATE_ENABLED;

Can't you pass the MySubscription as a parameter and then the
EnableTwoPhaseFailoverTriState can also set these
LOGICALREP_TWOPHASE_STATE_ENABLED/LOGICALREP_FAILOVER_STATE_ENABLED
states within the Enable* function?

======
src/backend/replication/repl_gram.y

12.
%token K_CREATE_REPLICATION_SLOT
%token K_DROP_REPLICATION_SLOT
+%token K_ALTER_REPLICATION_SLOT

and

+ create_replication_slot drop_replication_slot
+ alter_replication_slot identify_system read_replication_slot
+ timeline_history show upload_manifest

and

| create_replication_slot
| drop_replication_slot
+ | alter_replication_slot

and

| K_CREATE_REPLICATION_SLOT { $$ = "create_replication_slot"; }
| K_DROP_REPLICATION_SLOT { $$ = "drop_replication_slot"; }
+ | K_ALTER_REPLICATION_SLOT { $$ = "alter_replication_slot"; }

etc.

~

Although it makes no difference IMO it is more natural to code
everything in the order: create, alter, drop.

======
src/backend/replication/repl_scanner.l

13.
CREATE_REPLICATION_SLOT { return K_CREATE_REPLICATION_SLOT; }
DROP_REPLICATION_SLOT { return K_DROP_REPLICATION_SLOT; }
+ALTER_REPLICATION_SLOT { return K_ALTER_REPLICATION_SLOT; }

and

case K_CREATE_REPLICATION_SLOT:
case K_DROP_REPLICATION_SLOT:
+ case K_ALTER_REPLICATION_SLOT:

Although it makes no difference IMO it is more natural to code
everything in the order: create, alter, drop.

======
src/backend/replication/slot.c

14.
+ if (SlotIsPhysical(MyReplicationSlot))
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot use %s with a physical replication slot",
+ "ALTER_REPLICATION_SLOT"));

/with a/for a/

======
src/backend/replication/walsender.c

15.
+static void
+ParseAlterReplSlotOptions(AlterReplicationSlotCmd *cmd, bool *failover)
+{
+ ListCell *lc;
+ bool failover_given = false;
+
+ /* Parse options */
+ foreach(lc, cmd->options)
+ {
+ DefElem *defel = (DefElem *) lfirst(lc);

AFAIK there are some new-style macros now you can use for this code.
e.g. foreach_ptr? See [1].

~~~

16.
+ if (strcmp(defel->defname, "failover") == 0)
+ {
+ if (failover_given)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options")));
+ failover_given = true;
+ *failover = defGetBoolean(defel);
+ }

The documented syntax showed that passing the boolean value for the
FAILOVER option is not mandatory. Does this code work if the boolean
value is not passed?

======
src/bin/psql/tab-complete.c

17.
I think "ALTER SUBSCRIPTION ... SET (failover)" is possible, but the
ALTER SUBSCRIPTION tab completion code is missing.

======
src/include/nodes/replnodes.h

18.
+/* ----------------------
+ * ALTER_REPLICATION_SLOT command
+ * ----------------------
+ */
+typedef struct AlterReplicationSlotCmd
+{
+ NodeTag type;
+ char *slotname;
+ List *options;
+} AlterReplicationSlotCmd;
+
+

Same as an earlier comment. Although it makes no difference IMO it is
more natural to define these structs in the order:
CreateReplicationSlotCmd, then AlterReplicationSlotCmd, then
DropReplicationSlotCmd.

======
.../t/050_standby_failover_slots_sync.pl

19.
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+

/2023/2024/

~~~

20.
+# Create another subscription (using the same slot created above) that enables
+# failover.
+$subscriber1->safe_psql(
+ 'postgres', qq[
+ CREATE TABLE tab_int (a int PRIMARY KEY);
+ CREATE SUBSCRIPTION regress_mysub1 CONNECTION '$publisher_connstr'
PUBLICATION regress_mypub WITH (slot_name = lsub1_slot,
copy_data=false, failover = true, create_slot = false);

The comment should not say "Create another subscription" because this
is the first subscription being created.

/another/a/

~~~

21.
+##################################################
+# Test if changing the failover property of a subscription updates the
+# corresponding failover property of the slot.
+##################################################

/Test if/Test that/

======
src/test/regress/sql/subscription.sql

22.
+CREATE SUBSCRIPTION regress_testsub CONNECTION
'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect =
false, failover = true);
+
+\dRs+

This is currently only testing the explicit "failover=true".

Maybe you can also test the other kinds work as expected:
- explicit "SET (failover=false)"
- explicit "SET (failover)" with no value specified

======
[1] https://github.com/postgres/postgres/commit/14dd0f27d7cd56ffae9ecdbe324965073d01a9ff

Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeremy Schneider 2024-01-09 01:17:48 Re: Built-in CTYPE provider
Previous Message Michael Paquier 2024-01-09 01:09:31 Re: Adding facility for injection points (or probe points?) for more advanced tests