Re: Synchronizing slots from primary to standby

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2024-01-16 01:27:26
Message-ID: CAHut+PteZVNx1jQ6Hs3mEdoC=DNALVpJJ2mZDYim7sU-04tiaw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some review comments for patch v61-0002

======
doc/src/sgml/logical-replication.sgml

1.
+ <sect2 id="logical-replication-failover-examples">
+ <title>Examples: logical replication failover</title>

The current documentation structure (after the patch is applied) looks
like this:

30.1. Publication
30.2. Subscription
30.2.1. Replication Slot Management
30.2.2. Examples: Set Up Logical Replication
30.2.3. Examples: Deferred Replication Slot Creation
30.2.4. Examples: logical replication failover

I don't think it is ideal.

Firstly, I think this new section is not just "Examples:"; it is more
like instructions for steps to check if a successful failover is
possible. IMO call it something like "Logical Replication Failover" or
"Replication Slot Failover".

Secondly, I don't think this new section strictly belongs underneath
the "Subscription" section anymore because IMO it is just as much
about the promotion of the publications. Now that you are adding this
new (2nd) section about slots, I think the whole structure of this
document should be changed like below:

SUGGESTION #1 (make a new section 30.3 just for slot-related topics)

30.1. Publication
30.2. Subscription
30.2.1. Examples: Set Up Logical Replication
30.3. Logical Replication Slots
30.3.1. Replication Slot Management
30.3.2. Examples: Deferred Replication Slot Creation
30.3.3. Logical Replication Failover

~

SUGGESTION #2 (keep the existing structure, but give the failover its
own new section 30.3)

30.1. Publication
30.2. Subscription
30.2.1. Replication Slot Management
30.2.2. Examples: Set Up Logical Replication
30.2.3. Examples: Deferred Replication Slot Creation
30.3 Logical Replication Failover

~

SUGGESTION #2a (and maybe later you can extract some of the failover
examples further)

30.1. Publication
30.2. Subscription
30.2.1. Replication Slot Management
30.2.2. Examples: Set Up Logical Replication
30.2.3. Examples: Deferred Replication Slot Creation
30.3 Logical Replication Failover
30.3.1. Examples: Checking if failover ready

~~~

2.
+ <para>
+ In a logical replication setup, if the publisher server is also the primary
+ server of the streaming replication, the logical slots on the
primary server
+ can be synchronized to the standby server by specifying
<literal>failover = true</literal>
+ when creating the subscription. Enabling failover ensures a seamless
+ transition of the subscription to the promoted standby, allowing it to
+ subscribe to the new primary server without any data loss.
+ </para>

I was initially confused by the wording. How about like below:

SUGGESTION
When the publisher server is the primary server of a streaming
replication, the logical slots on that primary server can be
synchronized to the standby server by specifying <literal>failover =
true</literal> when creating subscriptions for those publications.
Enabling failover ensures a seamless transition of those subscriptions
after the standby is promoted. They can continue subscribing to
publications now on the new primary server without any data loss.

~~~

3.
+ <para>
+ However, the replication slots are copied asynchronously, which
means it's necessary
+ to confirm that replication slots have been synced to the standby server
+ before the failover happens. Additionally, to ensure a successful failover,
+ the standby server must not lag behind the subscriber. To confirm
+ that the standby server is ready for failover, follow these steps:
+ </para>

Minor rewording

SUGGESTION
Because the slot synchronization logic copies asynchronously, it is
necessary to confirm that replication slots have been synced to the
standby server before the failover happens. Furthermore, to ensure a
successful failover, the standby server must not be lagging behind the
subscriber. To confirm that the standby server is indeed ready for
failover, follow these 2 steps:

~~~

4.
The instructions said "follow these steps", so the next parts should
be rendered as 2 "steps" (using <procedure> markup?)

SUGGESTION (show as steps 1,2 and also some minor rewording of the
step heading)

1. Confirm that all the necessary logical replication slots have been
synced to the standby server.
2. Confirm that the standby server is not lagging behind the subscribers.

~~~

5.
+ <para>
+ Check if all the necessary logical replication slots have been synced to
+ the standby server.
+ </para>

SUGGESTION
Confirm that all the necessary logical replication slots have been
synced to the standby server.

~~~

6.
+ <listitem>
+ <para>
+ On logical subscriber, fetch the slot names that should be synced to the
+ standby that we plan to promote.

SUGGESTION
Firstly, on the subscriber node, use the following SQL to identify the
slot names that should be...

~~~

7.
+<programlisting>
+test_sub=# SELECT
+ array_agg(slotname) AS slots
+ FROM
+ ((
+ SELECT r.srsubid AS subid, CONCAT('pg_' || srsubid ||
'_sync_' || srrelid || '_' || ctl.system_identifier) AS slotname
+ FROM pg_control_system() ctl, pg_subscription_rel r,
pg_subscription s
+ WHERE r.srsubstate = 'f' AND s.oid = r.srsubid AND s.subfailover
+ ) UNION (
+ SELECT oid AS subid, subslotname as slotname
+ FROM pg_subscription
+ WHERE subfailover
+ ));

7a
Maybe this ought to include "pg_catalog" schemas?

~

7b.
For consistency, maybe it is better to use a table alias "FROM
pg_subscription s" in the UNION also

~~~

8.
+ <listitem>
+ <para>
+ Check that the logical replication slots exist on the standby server.

SUGGESTION
Next, check that the logical replication slots identified above exist
on the standby server.

~~~

9.
+<programlisting>
+test_standby=# SELECT bool_and(synced AND NOT temporary AND
conflict_reason IS NULL) AS failover_ready
+ FROM pg_replication_slots
+ WHERE slot_name in ('slots');
+ failover_ready
+----------------
+ t

9a.
Maybe this ought to include "pg_catalog" schemas?

~

9b.
IIUC that 'slots' reference is supposed to be those names that were
found in the prior step. If so, then that point needs to be made
clear, and anyway in this case 'slots' is not compatible with the
'sub' name returned by your first SQL.

~~~

10.
+ <listitem>
+ <para>
+ Query the last replayed WAL on the logical subscriber.

SUGGESTION
Firstly, on the subscriber node check the last replayed WAL.

~~~

11.
+<programlisting>
+test_sub=# SELECT
+ MAX(remote_lsn) AS remote_lsn_on_subscriber
+ FROM
+ ((
+ SELECT (CASE WHEN r.srsubstate = 'f' THEN
pg_replication_origin_progress(CONCAT('pg_' || r.srsubid || '_' ||
r.srrelid), false)
+ WHEN r.srsubstate = 's' THEN r.srsublsn
END) as remote_lsn
+ FROM pg_subscription_rel r, pg_subscription s
+ WHERE r.srsubstate IN ('f', 's') AND s.oid = r.srsubid
AND s.subfailover
+ ) UNION (
+ SELECT pg_replication_origin_progress(CONCAT('pg_' ||
s.oid), false) AS remote_lsn
+ FROM pg_subscription s
+ WHERE subfailover
+ ));

11a.
Maybe this ought to include "pg_catalog" schemas?

~

11b.
/WHERE subfailover/WHERE s.subfailover/

~~~

12.
+ <listitem>
+ <para>
+ On the standby server, check that the last-received WAL location
+ is ahead of the replayed WAL location on the subscriber.

SUGGESTION
Next, on the standby server check that the last-received WAL location
is ahead of the replayed WAL location on the subscriber identified
above.

~~~

13.
+</programlisting></para>
+ </listitem>
+ <listitem>
+ <para>
+ On the standby server, check that the last-received WAL location
+ is ahead of the replayed WAL location on the subscriber.
+<programlisting>
+test_standby=# SELECT pg_last_wal_receive_lsn() >=
'remote_lsn_on_subscriber'::pg_lsn AS failover_ready;
+ failover_ready
+----------------
+ t

IIUC the 'remote_lsn_on_subscriber' is supposed to represent the
substitution of the value found in the subscriber server. In this
example maybe it would be:
SELECT pg_last_wal_receive_lsn() >= '0/3000388'::pg_lsn AS failover_ready;

maybe that point can be made more clearly.

~~~

14.
+ <para>
+ If the result (failover_ready) of both above steps is true, it means it is
+ okay to subscribe to the standby server.
+ </para>

14a.
failover_ready should be rendered as literal.

~

14b.
Does this say what you intended, or did you mean something more like
"the standby can be promoted and existing subscriptions will be able
to continue without data loss"

======
src/backend/replication/logical/slotsync.c

15. local_slot_update

+/*
+ * Try to update local slot metadata based on the data from the remote slot.
+ *
+ * Return false if the data of the remote slot is the same as the local slot.
+ * Otherwise, return true.
+ */

There's not really any "try to" here; it either does it if needed or
doesn't do it because it's not needed.

SUGGESTION
If necessary, update local slot metadata based on the data from the remote slot.

If no update was needed (the data of the remote slot is the same as
the local slot)
return false, otherwise true.

~~~

16.
+ bool updated_xmin;
+ bool updated_restart;
+ Oid dbid;
+ ReplicationSlot *slot = MyReplicationSlot;
+
+ Assert(slot->data.invalidated == RS_INVAL_NONE);
+
+ updated_xmin = (remote_slot->catalog_xmin != slot->data.catalog_xmin);
+ updated_restart = (remote_slot->restart_lsn != slot->data.restart_lsn);
+ dbid = get_database_oid(remote_slot->database, false);
+
+ if (namestrcmp(&slot->data.plugin, remote_slot->plugin) == 0 &&
+ slot->data.database == dbid && !updated_restart && !updated_xmin &&
+ remote_slot->two_phase == slot->data.two_phase &&
+ remote_slot->failover == slot->data.failover &&
+ remote_slot->confirmed_lsn == slot->data.confirmed_flush)
+ return false;

It seems a bit strange to have boolean flags for some of the
differences (updated_xmin, updated_restart) but not for the others. I
expected it should be for all (e.g. updated_twophase,
updated_failover, ...) or none of them.

~~~

17. synchronize_one_slot

+ slot_updated = local_slot_update(remote_slot);
+
+ /* Make sure the slot changes persist across server restart */
+ if (slot_updated)
+ {
+ ReplicationSlotMarkDirty();
+ ReplicationSlotSave();
+ }

IMO this code would be simpler if written like below because then
'slot_updated' is only ever assigned when true instead of maybe
overwriting the default again with false:

SUGGESTION
/* Make sure the slot changes persist across server restart */
if (local_slot_update(remote_slot))
{
slot_updated = true;
ReplicationSlotMarkDirty();
ReplicationSlotSave();
}

======
src/backend/replication/slot.c

18. ReplicationSlotPersist - TEMPORARY v EPHEMERAL

I noticed this ReplicationSlotPersist() from v59-0002 was reverted:

- * Convert a slot that's marked as RS_EPHEMERAL to a RS_PERSISTENT slot,
- * guaranteeing it will be there after an eventual crash.
+ * Convert a slot that's marked as RS_EPHEMERAL or RS_TEMPORARY to a
+ * RS_PERSISTENT slot, guaranteeing it will be there after an eventual crash.

AFAIK in v61 you are still calling this function with RS_TEMPORARY
which is now contrary to the current function comment if you don't
change it to also mention RS_TEMPORARY.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kirk Wolak 2024-01-16 01:53:55 Re: Oom on temp (un-analyzed table caused by JIT) V16.1 [Fixed Already]
Previous Message torikoshia 2024-01-16 00:27:26 Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)