Re: Synchronizing slots from primary to standby

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-12-07 08:03:05
Message-ID: CAHut+PuuqEpDse5msENsVuK3rjTRN-QGS67rRCGVv+zcT-f0GA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi.

Here are my review comments for patch v43-0002.

======
Commit message

1.
The nap time of worker is tuned according to the activity on the primary.
The worker starts with nap time of 10ms and if no activity is observed on
the primary for some time, then nap time is increased to 10sec. And if
activity is observed again, nap time is reduced back to 10ms.

~
/nap time of worker/nap time of the worker/
/And if/If/

~~~

2.
Slots synced on the standby can be identified using 'sync_state' column of
pg_replication_slots view. The values are:
'n': none for user slots,
'i': sync initiated for the slot but waiting for the remote slot on the
primary server to catch up.
'r': ready for periodic syncs.

~

/identified using/identified using the/

The meaning of "identified by" is unclear to me. It also seems to
clash with later descriptions in system-views.sgml. Please see my
later review comment about it (in the sgml file)

======
doc/src/sgml/bgworker.sgml

3.
bgw_start_time is the server state during which postgres should start
the process; it can be one of BgWorkerStart_PostmasterStart (start as
soon as postgres itself has finished its own initialization; processes
requesting this are not eligible for database connections),
BgWorkerStart_ConsistentState (start as soon as a consistent state has
been reached in a hot standby, allowing processes to connect to
databases and run read-only queries), and
BgWorkerStart_RecoveryFinished (start as soon as the system has
entered normal read-write state. Note that the
BgWorkerStart_ConsistentState and BgWorkerStart_RecoveryFinished are
equivalent in a server that's not a hot standby), and
BgWorkerStart_ConsistentState_HotStandby (same meaning as
BgWorkerStart_ConsistentState but it is more strict in terms of the
server i.e. start the worker only if it is hot-standby; if it is
consistent state in non-standby, worker will not be started). Note
that this setting only indicates when the processes are to be started;
they do not stop when a different state is reached.

~

3a.
This seems to have grown to become just one enormous sentence that is
too hard to read. IMO this should be changed to be a <variablelist> of
possible values instead of a big slab of text. I suspect it could also
be simplified quite a lot -- something like below

SUGGESTION
bgw_start_time is the server state during which postgres should start
the process. Note that this setting only indicates when the processes
are to be started; they do not stop when a different state is reached.
Possible values are:

- BgWorkerStart_PostmasterStart (start as soon as postgres itself has
finished its own initialization; processes requesting this are not
eligible for database connections)

- BgWorkerStart_ConsistentState (start as soon as a consistent state
has been reached in a hot-standby, allowing processes to connect to
databases and run read-only queries)

- BgWorkerStart_RecoveryFinished (start as soon as the system has
entered normal read-write state. Note that the
BgWorkerStart_ConsistentState and BgWorkerStart_RecoveryFinished are
equivalent in a server that's not a hot standby)

- BgWorkerStart_ConsistentState_HotStandby (same meaning as
BgWorkerStart_ConsistentState but it is more strict in terms of the
server i.e. start the worker only if it is hot-standby; if it is a
consistent state in non-standby, the worker will not be started).

~~~

3b.
"i.e. start the worker only if it is hot-standby; if it is consistent
state in non-standby, worker will not be started"

~

Why is it even necessary to say the 2nd part "if it is consistent
state in non-standby, worker will not be started". It seems redundant
given 1st part says the same, right?

======
doc/src/sgml/config.sgml

4.
+ <para>
+ The standbys corresponding to the physical replication slots in
+ <varname>standby_slot_names</varname> must enable
+ <varname>enable_syncslot</varname> for the standbys to receive
+ failover logical slots changes from the primary.
+ </para>

4a.
Somehow "must enable enable_syncslot" seemed strange. Maybe re-word like:

"must enable slot synchronization (see enable_syncslot)"

OR

"must configure enable_syncslot = true"

~~~

4b.
(seems like repetitive use of "the standbys")

/for the standbys to/to/

OR

/for the standbys to/so they can/

~~~

5.
<varname>primary_conninfo</varname> string, or in a separate
- <filename>~/.pgpass</filename> file on the standby server (use
+ <filename>~/.pgpass</filename> file on the standby server. (use

This rearranged period seems unrelated to the current patch. Maybe
don't touch this.

~~~

6.
+ <para>
+ Specify <literal>dbname</literal> in
+ <varname>primary_conninfo</varname> string to allow synchronization
+ of slots from the primary server to the standby server.
+ This will only be used for slot synchronization. It is ignored
+ for streaming.
</para>

The wording "to allow synchronization of slots" seemed misleading to
me. Isn't that more the purpose of the 'enable_syncslot' GUC? I think
the intended wording is more like below:

SUGGESTION
If slot synchronization is enabled then it is also necessary to
specify <literal>dbname</literal> in the
<varname>primary_conninfo</varname> string. This will only be used for
slot synchronization. It is ignored for streaming.

======
doc/src/sgml/logicaldecoding.sgml

7.
+ <para>
+ A logical replication slot on the primary can be synchronized to the hot
+ standby by enabling the failover option during slot creation and set
+ <varname>enable_syncslot</varname> on the standby. For the synchronization
+ to work, it is mandatory to have physical replication slot between the
+ primary and the standby. This physical replication slot for the standby
+ should be listed in <varname>standby_slot_names</varname> on the primary
+ to prevent the subscriber from consuming changes faster than the hot
+ standby. Additionally, similar to creating a logical replication slot
+ on the hot standby, <varname>hot_standby_feedback</varname> should be
+ set on the standby and a physical slot between the primary and the standby
+ should be used.
+ </para>

7a.
/creation and set/creation and setting/
/to have physical replication/to have a physical replication/

~

7b.
It's unclear why this is saying "should be listed in
standby_slot_names" and "hot_standby_feedback should be set on the
standby". Why is it saying "should" instead of MUST -- are these
optional? I thought the GUC validation function mandates these (???).

~

7c.
Why does the paragraph say "and a physical slot between the primary
and the standby should be used."; isn't that exactly what was already
written earlier ("For the synchronization to work, it is mandatory to
have physical replication slot between the primary and the standby"

~~~

8.
+ <para>
+ By enabling synchronization of slots, logical replication can be resumed
+ after failover depending upon the
+ <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>sync_state</structfield>
+ for the synchronized slots on the standby at the time of failover.
+ The slots which were in ready sync_state ('r') on the standby before
+ failover can be used for logical replication after failover. However,
+ the slots which were in initiated sync_state ('i) and were not
+ sync-ready ('r') at the time of failover will be dropped and logical
+ replication for such slots can not be resumed after failover. This applies
+ to the case where a logical subscription is disabled before
failover and is
+ enabled after failover. If the synchronized slot due to disabled
+ subscription could not be made sync-ready ('r') on standby, then the
+ subscription can not be resumed after failover even when enabled.

8a.
This feels overcomplicated -- too much information?

SUGGESTION
depending upon the ... sync_state for the synchronized slots on the
standby at the time of failover. Only slots that were in ready
sync_state ('r') on the standby before failover can be used for
logical replication after failover

~~~

8b.
+ the slots which were in initiated sync_state ('i) and were not
+ sync-ready ('r') at the time of failover will be dropped and logical
+ replication for such slots can not be resumed after failover. This applies
+ to the case where a logical subscription is disabled before
failover and is
+ enabled after failover. If the synchronized slot due to disabled
+ subscription could not be made sync-ready ('r') on standby, then the
+ subscription can not be resumed after failover even when enabled.

But isn't ALL that part pretty much redundant information for the
user? I thought these are not ready state, so they are not usable...
End-Of-Story. Isn't everything else just more like implementation
details, which the user does not need to know about?

~~~

9.
+ If the primary is idle, making the synchronized slot on the standby
+ as sync-ready ('r') for enabled subscription may take noticeable time.
+ This can be sped up by calling the
+ <function>pg_log_standby_snapshot</function> function on the primary.
+ </para>

SUGGESTION
If the primary is idle, then the synchronized slots on the standby may
take a noticeable time to reach the ready ('r') sync_state. This can
be sped up by calling the
<function>pg_log_standby_snapshot</function> function on the primary.

======
doc/src/sgml/system-views.sgml

10.
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>sync_state</structfield> <type>char</type>
+ </para>
+ <para>
+ Defines slot synchronization state. This is meaningful on the physical
+ standby which has enabled slots synchronization.
+ </para>

I felt that this part "which has enabled slots synchronization" should
cross-reference to the 'sync_enabled' GUC.

~~~

11.
+ <para>
+ State code:
+ <literal>n</literal> = none for user created slots,
+ <literal>i</literal> = sync initiated for the slot but slot is not ready
+ yet for periodic syncs,
+ <literal>r</literal> = ready for periodic syncs.
+ </para>

I'm wondering why don't we just reuse 'd' (disabled), 'p' (pending),
'e' (enabled) like the other tri-state attributes are using.

~~~

12.
+ <para>
+ The hot standby can have any of these sync_state for the slots but on a
+ hot standby, the slots with state 'r' and 'i' can neither be
used for logical
+ decoded nor dropped by the user. The primary server will have sync_state
+ as 'n' for all the slots. But if the standby is promoted to become the
+ new primary server, sync_state can be seen 'r' as well. On this new
+ primary server, slots with sync_state as 'r' and 'n' will
behave the same.
+ </para></entry>
+ </row>

12a.
/logical decoded/logical decoding/

~

12b.
"sync_state as 'r' and 'n' will behave the same" sounds kind of hacky.
Is there no alternative?

Anyway, IMO mentioning about primary server states seems overkill,
because you already said "This is meaningful on the physical standby"
which I took as implying that it is *not* meaningful from the POV of
the primary server.

In light of this, I'm wondering if a better name for this attribute
would be: 'standby_sync_state'

======
src/backend/access/transam/xlogrecovery.c

13.
+ /*
+ * Shutdown the slot sync workers to prevent potential conflicts between
+ * user processes and slotsync workers after a promotion. Additionally,
+ * drop any slots that have initiated but not yet completed the sync
+ * process.
+ */
+ ShutDownSlotSync();
+ slotsync_drop_initiated_slots();
+

Is this where maybe the 'sync_state' should also be updated for
everything so you are not left with confusion about different states
on a node that is no longer a standby node?

======
src/backend/postmaster/postmaster.c

14. PostmasterMain

ApplyLauncherRegister();

+ SlotSyncWorkerRegister();
+

Every other function call here is heavily commented but there is a
conspicuous absence of a comment here.

~~~

15. bgworker_should_start_now

if (start_time == BgWorkerStart_ConsistentState)
return true;
+ else if (start_time == BgWorkerStart_ConsistentState_HotStandby &&
+ pmState != PM_RUN)
+ return true;
/* fall through */
Change "else if" to "if" would be simpler.

======
.../libpqwalreceiver/libpqwalreceiver.c

16.
+ for (opt = opts; opt->keyword != NULL; ++opt)
+ {
+ /*
+ * If multiple dbnames are specified, then the last one will be
+ * returned
+ */
+ if (strcmp(opt->keyword, "dbname") == 0 && opt->val &&
+ opt->val[0] != '\0')
+ dbname = pstrdup(opt->val);
+ }

This can use a tidier C99 style to declare 'opt' as the loop variable.

~~~

17.
static void
libpqrcv_alter_slot(WalReceiverConn *conn, const char *slotname,
- bool failover)
+ bool failover)

What is this change for? Or, if something is wrong with the indent
then anyway it should be fixed in patch 0001.

======
src/backend/replication/logical/logical.c

18.

+ /*
+ * Slots in state SYNCSLOT_STATE_INITIATED should have been dropped on
+ * promotion.
+ */
+ if (!RecoveryInProgress() && slot->data.sync_state ==
SYNCSLOT_STATE_INITIATED)
+ elog(ERROR, "replication slot \"%s\" was not synced completely from
the primary server",
+ NameStr(slot->data.name));
+
+ /*
+ * Do not allow consumption of a "synchronized" slot until the standby
+ * gets promoted.
+ */
+ if (RecoveryInProgress() && slot->data.sync_state != SYNCSLOT_STATE_NONE)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot use replication slot \"%s\" for logical decoding",
+ NameStr(slot->data.name)),
+ errdetail("This slot is being synced from the primary server."),
+ errhint("Specify another replication slot.")));
+

18a.

Instead of having !RecoveryInProgress() and RecoveryInProgress() in
separate conditions is the code simpler like:

SUGGESTION

if (RecoveryInProgress())
{
/* Do not allow ... */
if (slot->data.sync_state != SYNCSLOT_STATE_NONE) ...
}
else
{
/* Slots in state... */
if (slot->data.sync_state == SYNCSLOT_STATE_INITIATED) ...
}

~

18b.
Should the errdetail give the current state?

======
src/backend/replication/logical/slotsync.c

19.
+/*
+ * Number of attempts for wait_for_primary_slot_catchup() after
+ * which it aborts the wait and the slot sync worker then moves
+ * to the next slot creation/sync.
+ */
+#define WORKER_PRIMARY_CATCHUP_WAIT_ATTEMPTS 5

Given this is only used within one static function, I'm wondering if
it would be tidier to also move this macro to within that function.

~~~

20. wait_for_primary_slot_catchup

+/*
+ * Wait for remote slot to pass locally reserved position.
+ *
+ * Ping and wait for the primary server for
+ * WORKER_PRIMARY_CATCHUP_WAIT_ATTEMPTS during a slot creation, if it still
+ * does not catch up, abort the wait. The ones for which wait is aborted will
+ * attempt the wait and sync in the next sync-cycle.
+ *
+ * *persist will be set to false if the slot has disappeared or was invalidated
+ * on the primary; otherwise, it will be set to true.
+ */

20a.
The comment doesn't say the meaning of the boolean returned.

~

20b.
/*persist will be set/If passed, *persist will be set/

~~~

21.
+ appendStringInfo(&cmd,
+ "SELECT conflicting, restart_lsn, confirmed_flush_lsn,"
+ " catalog_xmin FROM pg_catalog.pg_replication_slots"
+ " WHERE slot_name = %s",
+ quote_literal_cstr(remote_slot->name));

Somehow, I felt it is more readable if the " FROM" starts on a new line.

e.g.
"SELECT conflicting, restart_lsn, confirmed_flush_lsn, catalog_xmin"
" FROM pg_catalog.pg_replication_slots"
" WHERE slot_name = %s"

~~~

22.
+ ereport(ERROR,
+ (errmsg("could not fetch slot info for slot \"%s\" from the"
+ " primary server: %s",
+ remote_slot->name, res->err)));

Perhaps the message can be shortened like:
"could not fetch slot \"%s\" info from the primary server: %s"

~~~

23.
+ ereport(WARNING,
+ (errmsg("slot \"%s\" disappeared from the primary server,"
+ " slot creation aborted", remote_slot->name)));

Would this be better split into parts?

SUGGESTION
errmsg "slot \"%s\" creation aborted"
errdetail "slot was not found on the primary server"

~~~

24.
+ ereport(WARNING,
+ (errmsg("slot \"%s\" invalidated on the primary server,"
+ " slot creation aborted", remote_slot->name)));

(similar to previous)

SUGGESTION
errmsg "slot \"%s\" creation aborted"
errdetail "slot was invalidated on the primary server"

~~~

25.
+ /*
+ * Once we got valid restart_lsn, then confirmed_lsn and catalog_xmin
+ * are expected to be valid/non-null.
+ */

SUGGESTION
Having got a valid restart_lsn, the confirmed_lsn and catalog_xmin are
expected to be valid/non-null.

~~~

26. slotsync_drop_initiated_slots

+/*
+ * Drop the slots for which sync is initiated but not yet completed
+ * i.e. they are still waiting for the primary server to catch up.
+ */

I found "waiting for the primary server to catch up" to be difficult
to understand without knowing the full details, but it is not really
described properly until a much larger comment that is buried in the
synchronize_one_slot(). So I think all this needs explanation up-front
in the file, which you can refer to. I have repeated this same review
comment in a couple of places.

~~~

27. get_local_synced_slot_names

+static List *
+get_local_synced_slot_names(void)
+{
+ List *localSyncedSlots = NIL;

27a.
It's not returning a list of "names" though, so is this an appropriate
function name?

~~~

27b.
Suggest just call that ('localSyncedSlots') differently.
- In slotsync_drop_initiated_slots() function they are just called 'slots'
- In drop_obsolete_slots() function it is called 'local_slot_list'

IMO it is better if all these are consistently named -- just all lists
'slots' or all 'local_slots' or whatever.

~~~

28. check_sync_slot_validity

+static bool
+check_sync_slot_validity(ReplicationSlot *local_slot, List *remote_slots,
+ bool *locally_invalidated)

Somehow this wording "validity" seems like a misleading function name,
because the return value has nothing to do with the slot field
invalidated.

The validity/locally_invalidated stuff is a secondary return as a side
effect for the "true" case.

A more accurate function name would be more like check_sync_slot_on_remote().

~~~

29. check_sync_slot_validity

+static bool
+check_sync_slot_validity(ReplicationSlot *local_slot, List *remote_slots,
+ bool *locally_invalidated)
+{
+ ListCell *cell;

There is inconsistent naming --

ListCell lc; ListCell cell; ListCell lc_slot; etc..

IMO the more complicated names aren't of much value -- probably
everything can be changed to 'lc' for consistency.

~~~

30. drop_obsolete_slots

+ /*
+ * Get the list of local 'synced' slot so that those not on remote could
+ * be dropped.
+ */

/slot/slots/

Also, I don't think it is necessary to say "so that those not on
remote could be dropped." -- That is already described in the function
comment and again in a comment later in the loop. That seems enough.
If the function name get_local_synced_slot_names() is improved a bit
the comment seems redundant because it is obvious from the function
name.

~~~

31.
+ foreach(lc_slot, local_slot_list)
+ {
+ ReplicationSlot *local_slot = (ReplicationSlot *) lfirst(lc_slot);
+ bool local_exists = false;
+ bool locally_invalidated = false;
+
+ local_exists = check_sync_slot_validity(local_slot, remote_slot_list,
+ &locally_invalidated);

Shouldn't that 'local_exists' variable be called 'remote_exists'?
That's what the other comments seem to be saying.

~~~

32. construct_slot_query

+ appendStringInfo(s,
+ "SELECT slot_name, plugin, confirmed_flush_lsn,"
+ " restart_lsn, catalog_xmin, two_phase, failover,"
+ " database, pg_get_slot_invalidation_cause(slot_name)"
+ " FROM pg_catalog.pg_replication_slots"
+ " WHERE failover and sync_state != 'i'");

Just wondering if substituting the SYNCSLOT_STATE_INITIATED constant
here might be more appropriate than hardwiring 'i'. Why have a
constant but not use it?

~~~

33. synchronize_one_slot

+static void
+synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot,
+ bool *slot_updated)
+{
+ ReplicationSlot *s;
+ char sync_state = 0;

33a.
It seems strange that the sync_state is initially assigned something
other than the 3 legal values. Should this be defaulting to
SYNCSLOT_STATE_NONE instead?

~

33b.
I think it is safer to default the *slot_updated = false; because the
code appears to assume it was false already which may or may not be
true.

~~~

34.
+ /*
+ * Make sure that concerned WAL is received before syncing slot to target
+ * lsn received from the primary server.
+ *
+ * This check should never pass as on the primary server, we have waited
+ * for the standby's confirmation before updating the logical slot.
+ */

Maybe this comment should mention up-front that it is just a "Sanity check:"

~~~

35.
+ /*
+ * With hot_standby_feedback enabled and invalidations handled
+ * apropriately as above, this should never happen.
+ */
+ if (remote_slot->restart_lsn < MyReplicationSlot->data.restart_lsn)
+ {
+ ereport(ERROR,
+ errmsg("not synchronizing local slot \"%s\" LSN(%X/%X)"
+ " to remote slot's LSN(%X/%X) as synchronization "
+ " would move it backwards", remote_slot->name,
+ LSN_FORMAT_ARGS(MyReplicationSlot->data.restart_lsn),
+ LSN_FORMAT_ARGS(remote_slot->restart_lsn)));
+
+ goto cleanup;
+ }

35a.
IIUC then this another comment that should say it is just a "Sanity-check:".

~

35b.
I was wondering if there should be Assert(hot_standby_feedback) here
also. The comment "With hot_standby_feedback enabled" is a bit vague
whereas including an Assert will clarify that it must be set.

~

35c.
Since it says "this should never happen" then it appears elog is more
appropriate than ereport because translations are not needed, right?

~

35d.
The ERROR will make that goto cleanup unreachable, won't it?

~~~

36.
+ /*
+ * Already existing slot but not ready (i.e. waiting for the primary
+ * server to catch-up), lets attempt to make it sync-ready now.
+ */

/lets/let's/

~~~

37.
+ /*
+ * Refer the slot creation part (last 'else' block) for more details
+ * on this wait.
+ */
+ if (remote_slot->restart_lsn < MyReplicationSlot->data.restart_lsn ||
+ TransactionIdPrecedes(remote_slot->catalog_xmin,
+ MyReplicationSlot->data.catalog_xmin))
+ {
+ if (!wait_for_primary_slot_catchup(wrconn, remote_slot, NULL))
+ {
+ goto cleanup;
+ }
+ }

37a.
Having to jump forward to understand earlier code seems backward. IMO
there should be a big comment atop this module about this subject
which the comment here can just refer to. I will write more about this
topic later (below).

~

37b.
The extra code curly braces are not needed.

~~~

38.
+ ereport(LOG, errmsg("newly locally created slot \"%s\" is sync-ready "
+ "now", remote_slot->name));

Better to put the whole errmsg() on a newline instead of splitting the
string like that.

~~~

39.
+ /* User created slot with the same name exists, raise ERROR. */
+ else if (sync_state == SYNCSLOT_STATE_NONE)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("skipping sync of slot \"%s\" as it is a user created"
+ " slot", remote_slot->name),
+ errdetail("This slot has failover enabled on the primary and"
+ " thus is sync candidate but user created slot with"
+ " the same name already exists on the standby")));
+ }

I felt it would be better to eliminate this case immediately up-front
when you first searched for the slot names. e.g. code like below. IIUC
this refactor also means the default sync_state can be assigned a
normal value (as I suggested above) instead of the strange assignment
to 0.

+ /* Search for the named slot */
+ if ((s = SearchNamedReplicationSlot(remote_slot->name, true)))
+ {
+ SpinLockAcquire(&s->mutex);
+ sync_state = s->data.sync_state;
+ SpinLockRelease(&s->mutex);

INSERT HERE
+ /* User-created slot with the same name exists, raise ERROR. */
+ if (sync_state == SYNCSLOT_STATE_NONE)
+ ereport(ERROR, ...
+ }

~~~

40.
+ /* Otherwise create the slot first. */
+ else
+ {

Insert a blank line above that comment for better readability (same as
done for earlier 'else' in this same function)

~~~

41.
+ ReplicationSlotCreate(remote_slot->name, true, RS_EPHEMERAL,
+ remote_slot->two_phase,
+ remote_slot->failover,
+ SYNCSLOT_STATE_INITIATED);
+
+ slot = MyReplicationSlot;

In hindsight, the prior if/else code blocks in this function also
could have done "slot = MyReplicationSlot;" same as this -- then the
code would be much less verbose.

~~~

42.
+ SpinLockAcquire(&slot->mutex);
+ slot->data.database = get_database_oid(remote_slot->database, false);
+
+ namestrcpy(&slot->data.plugin, remote_slot->plugin);
+ SpinLockRelease(&slot->mutex);

IMO the code would be more readable *without* a blank line here
because the mutexed block is more obvious.

~~~

43.
+ /*
+ * If the local restart_lsn and/or local catalog_xmin is ahead of
+ * those on the remote then we cannot create the local slot in sync
+ * with the primary server because that would mean moving the local
+ * slot backwards and we might not have WALs retained for old LSN. In
+ * this case we will wait for the primary server's restart_lsn and
+ * catalog_xmin to catch up with the local one before attempting the
+ * sync.
+ */

43a.
This comment describes some fundamental concepts about how this logic
works. I felt this and other comments like this should be at the top
of this slotsync.c file. Then anything that needs to mention about it
can refer to the top comment. For example, I also found other comments
like "... they are still waiting for the primary server to catch up."
to be difficult to understand without knowing these details, but I
think describing core design stuff up-front and saying "refer to the
comment atop the fil" probably would help a lot.

~

43b.
Should "wait for the primary server's restart_lsn and..." be "wait for
the primary server slot's restart_lsn and..." ?

~~~

44.
+ {
+ bool persist;
+
+ if (!wait_for_primary_slot_catchup(wrconn, remote_slot, &persist))
+ {
+ /*
+ * The remote slot didn't catch up to locally reserved
+ * position.
+ *
+ * We do not drop the slot because the restart_lsn can be
+ * ahead of the current location when recreating the slot in
+ * the next cycle. It may take more time to create such a
+ * slot. Therefore, we persist it (provided remote-slot is
+ * still valid) and attempt the wait and synchronization in
+ * the next cycle.
+ */
+ if (persist)
+ {
+ ReplicationSlotPersist();
+ *slot_updated = true;
+ }
+
+ goto cleanup;
+ }
+ }

Looking at the way this 'persist' parameter is used I felt is it too
complicated. IIUC the wait_for_primary_slot_catchup can only return
*persist = true (for a false return) when it has reached/exceeded the
number of retries and still not yet caught up. Why should
wait_for_primary_slot_catchup() pretend to know about persistence?

In other words, I thought a more meaningful parameter/variable name
(instead of 'persist') is something like 'wait_attempts_exceeded'. IMO
that will make wait_for_primary_slot_catchup() code easier, and here
you can just say like below, where the code matches the comment
better. Thoughts?

+ if (wait_attempts_exceeded)
+ {
+ ReplicationSlotPersist();
+ *slot_updated = true;
+ }

~~~

45.
+
+
+ /*
+ * Wait for primary is either not needed or is over. Update the lsns
+ * and mark the slot as READY for further syncs.
+ */

Double blank lines?

~~~

46.
+ ereport(LOG, errmsg("newly locally created slot \"%s\" is sync-ready "
+ "now", remote_slot->name));
+ }
+
+cleanup:

Better to put the whole errmsg() on a newline instead of splitting the
string like that.

~~~

47. synchronize_slots

+/*
+ * Synchronize slots.
+ *
+ * Gets the failover logical slots info from the primary server and update
+ * the slots locally. Creates the slots if not present on the standby.
+ *
+ * Returns nap time for the next sync-cycle.
+ */
+static long
+synchronize_slots(WalReceiverConn *wrconn)

/update/updates/

~~~

48.
+ /* The primary_slot_name is not set yet or WALs not received yet */
+ SpinLockAcquire(&WalRcv->mutex);
+ if (!WalRcv ||
+ (WalRcv->slotname[0] == '\0') ||
+ XLogRecPtrIsInvalid(WalRcv->latestWalEnd))
+ {
+ SpinLockRelease(&WalRcv->mutex);
+ return naptime;
+ }
+ SpinLockRelease(&WalRcv->mutex);

Just wondering if the scenario of "WALS not received" is a bit more
like "no activity" so perhaps the naptime returned should be
WORKER_INACTIVITY_NAPTIME_MS here?

~~~

49.
+ /* Construct query to get slots info from the primary server */
+ initStringInfo(&s);
+ construct_slot_query(&s);

I did not like the construct_slot_query() to be separated from this
function because it makes it too difficult to see if the slot_attr
numbers and column types in this function are correct w.r.t. that
query. IMO better when everything is in the same place where you can
see it all together. e.g. Less risk of breaking something if changes
are made.

~~~

50.
+ /* Construct the remote_slot tuple and synchronize each slot locally */
+ slot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);

Normally in all the other functions the variable 'slot' was the local
ReplicationSlot but IIUC here represents a remote tuple. Making a
different name would be better like 'remote_slottup' or something
else.

~~~

51.
+ /*
+ * If any of the slots get updated in this sync-cycle, retain default
+ * naptime and update 'last_update_time' in slot sync worker. But if no
+ * activity is observed in this sync-cycle, then increase naptime provided
+ * inactivity time reaches threshold.
+ */

I think "retain" is a slightly wrong word here because it might have
been WORKER_INACTIVITY_NAPTIME_MS in the previous cycle.

Maybe just /retain/use/

~~~

52.
+/*
+ * Connects primary to validate the slot specified in primary_slot_name.
+ *
+ * Exits the worker if physical slot with the specified name does not exist.
+ */
+static void
+validate_primary_slot(WalReceiverConn *wrconn)

There is already a connection, so not sure if this connect should be
saying "connects to"; Maybe is should be saying more like below:

SUGGESTION
Using the specified primary server connection, validate if the
physical slot identified by GUC primary_slot_name exists.

Exit the worker if the slot is not found.

~~~

53.
+ initStringInfo(&cmd);
+ appendStringInfo(&cmd,
+ "select count(*) = 1 from pg_replication_slots where "
+ "slot_type='physical' and slot_name=%s",
+ quote_literal_cstr(PrimarySlotName));

Write the SQL keywords in uppercase.

~~~

54.
+ if (res->status != WALRCV_OK_TUPLES)
+ ereport(ERROR,
+ (errmsg("could not fetch primary_slot_name info from the "
+ "primary: %s", res->err)));

Shouldn't the name of the unfound slot be shown in the ereport, or
will that already appear in the res->err?

~~~

55.
+ ereport(ERROR,
+ errmsg("exiting slots synchronization as slot specified in "
+ "primary_slot_name is not valid"));
+

IMO the format should be the same as I suggested (later) for all the
validate_slotsync_parameters() errors.

Also, I think the name of the unfound slot needs to be in this message.

So maybe result is like this:

SUGGESTION

ereport(ERROR,
errmsg("exiting from slot synchronization due to bad configuration")
/* translator: second %s is a GUC variable name */
errhint("The primary slot \"%s\" specified by %s is not valid.",
slot_name, "primary_slot_name")
);

~~~

56.
+/*
+ * Checks if GUCs are set appropriately before starting slot sync worker
+ */
+static void
+validate_slotsync_parameters(char **dbname)
+{
+ /*
+ * Since 'enable_syncslot' is ON, check that other GUC settings
+ * (primary_slot_name, hot_standby_feedback, wal_level, primary_conninfo)
+ * are compatible with slot synchronization. If not, raise ERROR.
+ */
+

56a.
I thought that 2nd comment sort of belonged in the function comment.

~

56b.
It says "Since 'enable_syncslot' is ON", but I IIUC that is wrong
because the other function slotsync_reread_config() might detect a
change in this GUC and cause this validate_slotsync_parameters() to be
called when enable_syncslot was changed to false.

In other words, I think you also need to check 'enable_syncslot' and
exit with appropriate ERROR same as all the other config problems.

OTOH if this is not possible, then the slotsync_reread_config() might
need fixing instead.

~~~

57.
+ /*
+ * A physical replication slot(primary_slot_name) is required on the
+ * primary to ensure that the rows needed by the standby are not removed
+ * after restarting, so that the synchronized slot on the standby will not
+ * be invalidated.
+ */
+ if (PrimarySlotName == NULL || strcmp(PrimarySlotName, "") == 0)
+ ereport(ERROR,
+ errmsg("exiting slots synchronization as primary_slot_name is "
+ "not set"));
+
+ /*
+ * Hot_standby_feedback must be enabled to cooperate with the physical
+ * replication slot, which allows informing the primary about the xmin and
+ * catalog_xmin values on the standby.
+ */
+ if (!hot_standby_feedback)
+ ereport(ERROR,
+ errmsg("exiting slots synchronization as hot_standby_feedback "
+ "is off"));
+
+ /*
+ * Logical decoding requires wal_level >= logical and we currently only
+ * synchronize logical slots.
+ */
+ if (wal_level < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ errmsg("exiting slots synchronisation as it requires "
+ "wal_level >= logical"));
+
+ /*
+ * The primary_conninfo is required to make connection to primary for
+ * getting slots information.
+ */
+ if (PrimaryConnInfo == NULL || strcmp(PrimaryConnInfo, "") == 0)
+ ereport(ERROR,
+ errmsg("exiting slots synchronization as primary_conninfo "
+ "is not set"));
+
+ /*
+ * The slot sync worker needs a database connection for walrcv_exec to
+ * work.
+ */
+ *dbname = walrcv_get_dbname_from_conninfo(PrimaryConnInfo);
+ if (*dbname == NULL)
+ ereport(ERROR,
+ errmsg("exiting slots synchronization as dbname is not "
+ "specified in primary_conninfo"));
+
+}

IMO all these errors can be improved by:
- using a common format
- including errhint for the reason
- using the same tone for instructions on what to do (e.g saying must
be set, rather than what was not set)

SUGGESTION (something like this)

ereport(ERROR,
errmsg("exiting from slot synchronization due to bad configuration")
/* translator: %s is a GUC variable name */
errhint("%s must be defined.", "primary_slot_name")
);

ereport(ERROR,
errmsg("exiting from slot synchronization due to bad configuration")
/* translator: %s is a GUC variable name */
errhint("%s must be enabled.", "hot_standby_feedback")
);

ereport(ERROR,
errmsg("exiting from slot synchronization due to bad configuration")
/* translator: wal_level is a GUC variable name, 'logical' is a value */
errhint("wal_level must be >= logical.")
);

ereport(ERROR,
errmsg("exiting from slot synchronization due to bad configuration")
/* translator: %s is a GUC variable name */
errhint("%s must be defined.", "primary_conninfo")
);

ereport(ERROR,
errmsg("exiting from slot synchronization due to bad configuration")
/* translator: 'dbname' is a specific option; %s is a GUC variable name */
errhint("'dbname' must be specified in %s.", "primary_conninfo")
);

~~~

58.
+ *dbname = walrcv_get_dbname_from_conninfo(PrimaryConnInfo);
+ if (*dbname == NULL)
+ ereport(ERROR,
+ errmsg("exiting slots synchronization as dbname is not specified in
primary_conninfo"));
+
+}

Unnecessary blank line at the end of the function

~~~

59.
+/*
+ * Re-read the config file.
+ *
+ * If any of the slot sync GUCs changed, validate the values again
+ * through validate_slotsync_parameters() which will exit the worker
+ * if validaity fails.
+ */

SUGGESTION
If any of the slot sync GUCs have changed, re-validate them. The
worker will exit if the check fails.

~~~

60.
+ char *conninfo = pstrdup(PrimaryConnInfo);
+ char *slotname = pstrdup(PrimarySlotName);
+ bool syncslot = enable_syncslot;
+ bool standbyfeedback = hot_standby_feedback;

For clarity, I would have used var names to match the old GUCs.

e.g.
/conninfo/old_primary_conninfo/
/slotname/old_primary_slot_name/
/syncslot/old_enable_syncslot/
/standbyfeedback/old_hot_standby_feedback/

~~~

61.
+ dbname = walrcv_get_dbname_from_conninfo(PrimaryConnInfo);
+ Assert(dbname);

This code seems premature. IIUC this is only needed to detect that the
dbname was changed. But I think the prerequisite is first that the
conninfoChanged is true. So really this code should be guarded by if
(conninfoChanged) so it can be done later in the function.

~~~

62.
+ if (conninfoChanged || slotnameChanged ||
+ (syncslot != enable_syncslot) ||
+ (standbyfeedback != hot_standby_feedback))
+ {
+ revalidate = true;
+ }

SUGGESTION

revalidate = conninfoChanged || slotnameChanged ||
(syncslot != enable_syncslot) ||
(standbyfeedback != hot_standby_feedback);

~~~

63.
+ /*
+ * Since we have initialized this worker with old dbname, thus exit if
+ * dbname changed. Let it get restarted and connect to new dbname
+ * specified.
+ */
+ if (conninfoChanged && strcmp(dbname, new_dbname) != 0)
+ {
+ ereport(ERROR,
+ errmsg("exiting slot sync woker as dbname in "
+ "primary_conninfo changed"));
+ }

63a.
/old dbname/the old dbname/
/new dbname/the new dbname/
/woker/worker/

~

63b.
This code feels awkward. Can't this dbname check and accompanying
ERROR message be moved down into validate_slotsync_parameters(), so it
lives along with all the other GUC validation logic? Maybe you'll need
to change the validate_slotsync_parameters() parameters slightly but I
think it is much better to keep all the validation together.

~~~

64.
+
+
+/*
+ * Interrupt handler for main loop of slot sync worker.
+ */
+static void
+ProcessSlotSyncInterrupts(WalReceiverConn **wrconn)

Double blank lines.

~~~

65.
+
+
+ if (ConfigReloadPending)
+ slotsync_reread_config();
+}

Double blank lines

~~~

66. slotsync_worker_onexit

+static void
+slotsync_worker_onexit(int code, Datum arg)
+{
+ SpinLockAcquire(&SlotSyncWorker->mutex);
+ SlotSyncWorker->pid = 0;
+ SpinLockRelease(&SlotSyncWorker->mutex);
+}

Should assignment use InvalidPid (-1) instead of 0?

~~~

67. ReplSlotSyncWorkerMain

+ SpinLockAcquire(&SlotSyncWorker->mutex);
+
+ Assert(SlotSyncWorker->pid == 0);
+
+ /* Advertise our PID so that the startup process can kill us on promotion */
+ SlotSyncWorker->pid = MyProcPid;
+
+ SpinLockRelease(&SlotSyncWorker->mutex);

Shouldn't pid start as InvalidPid (-1) instead of Assert 0?

~~~

68.
+ /* Connect to the primary server */
+ wrconn = remote_connect();
+
+ /*
+ * Connect to primary and validate the slot specified in
+ * primary_slot_name.
+ */
+ validate_primary_slot(wrconn);

Maybe needs some slight rewording in the 2nd comment. "Connect to
primary server" is already said and done in the 1st part.

~~~

69. IsSlotSyncWorker

+/*
+ * Is current process the slot sync worker?
+ */
+bool
+IsSlotSyncWorker(void)
+{
+ return SlotSyncWorker->pid == MyProcPid;
+}

69a.
For consistency with others like it, I thought this be called
IsLogicalSlotSyncWorker().

~

69b.
For consistency with the others like this, I think the extern should
be declared in logicalworker.h

~~~

70. ShutDownSlotSync

+ SpinLockAcquire(&SlotSyncWorker->mutex);
+ if (!SlotSyncWorker->pid)
+ {
+ SpinLockRelease(&SlotSyncWorker->mutex);
+ return;
+ }

IMO should be comparing with InvalidPid (-1) here; not 0.

~~~

71.
+ SpinLockAcquire(&SlotSyncWorker->mutex);
+
+ /* Is it gone? */
+ if (!SlotSyncWorker->pid)
+ break;
+
+ SpinLockRelease(&SlotSyncWorker->mutex);

Ditto. bad pids should be InvalidPid (-1), not 0.

~~~

72. SlotSyncWorkerShmemInit

+ if (!found)
+ {
+ memset(SlotSyncWorker, 0, size);
+ SpinLockInit(&SlotSyncWorker->mutex);
+ }

Probably here the unassigned pid should be set to InvalidPid (-1), not 0.

~~~

73. SlotSyncWorkerRegister

+ if (!enable_syncslot)
+ {
+ ereport(LOG,
+ errmsg("skipping slots synchronization as enable_syncslot is "
+ "disabled."));
+ return;
+ }

/as/because/

======
src/backend/replication/logical/tablesync.c

74.
#include "commands/copy.h"
+#include "commands/subscriptioncmds.h"
#include "miscadmin.h"

There were only #include changes but no code changes. Is the #include needed?

======
src/backend/replication/slot.c

75. ReplicationSlotCreate

void
ReplicationSlotCreate(const char *name, bool db_specific,
ReplicationSlotPersistency persistency,
- bool two_phase, bool failover)
+ bool two_phase, bool failover, char sync_state)

The function comment goes to trouble to describe all the parameters
except for 'failover' and 'sync_slate'. I think a failover comment
should be added in patch 0001 and then the sync_state comment should
be added in patch 0002.

~~~

76.
+ /*
+ * Do not allow users to drop the slots which are currently being synced
+ * from the primary to the standby.
+ */
+ if (user_cmd && RecoveryInProgress() &&
+ MyReplicationSlot->data.sync_state != SYNCSLOT_STATE_NONE)
+ {
+ ReplicationSlotRelease();
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot drop replication slot \"%s\"", name),
+ errdetail("This slot is being synced from the primary.")));
+ }

Should the errdetail give the current state?

======
src/backend/tcop/postgres.c

77.
+ else if (IsSlotSyncWorker())
+ {
+ ereport(DEBUG1,
+ (errmsg_internal("replication slot sync worker is shutting down due
to administrator command")));
+
+ /*
+ * Slot sync worker can be stopped at any time.
+ * Use exit status 1 so the background worker is restarted.
+ */
+ proc_exit(1);
+ }

Explicitly saying "ereport(DEBUG1, errmsg_internal(..." is a bit
overkill; it is simpler to write this as "elog(DEBUG1, ....);

======
src/include/replication/slot.h

78.
+/* The possible values for 'sync_state' in ReplicationSlotPersistentData */
+#define SYNCSLOT_STATE_NONE 'n' /* None for user created slots */
+#define SYNCSLOT_STATE_INITIATED 'i' /* Sync initiated for the slot but
+ * not completed yet, waiting for
+ * the primary server to catch-up */
+#define SYNCSLOT_STATE_READY 'r' /* Initialization complete, ready
+ * to be synced further */

Already questioned the same elsewhere. IIUC the same tri-state values
of other attributes might be used here too without needing to
introduce 3 new values.

e.g.

#define SYNCSLOT_STATE_DISABLED 'd' /* No syncing for this slot */
#define SYNCSLOT_STATE_PENDING 'p' /* Sync is enabled but we must
wait for the primary server to catch up */
#define SYNCSLOT_STATE_ENABLED 'e' /* Sync is enabled and the slot is
ready to be synced */

~~~

79.
+ /*
+ * Is this a slot created by a sync-slot worker?
+ *
+ * Relevant for logical slots on the physical standby.
+ */
+ char sync_state;
+

I assumed that "Relevant for" means "Only relevant for". It should say that.

If correct, IMO a better field name might be 'standby_sync_state'

======
src/test/recovery/t/050_verify_slot_order.pl

80.
+$backup_name = 'backup2';
+$primary->backup($backup_name);
+
+# Create standby3
+my $standby3 = PostgreSQL::Test::Cluster->new('standby3');
+$standby3->init_from_backup(
+ $primary, $backup_name,
+ has_streaming => 1,
+ has_restoring => 1);

The mixture of 'backup2' for 'standby3' seems confusing. Is there a
reason to call it backup2?

~~~

81.
+# Verify slot properties on the standby
+is( $standby3->safe_psql('postgres',
+ q{SELECT failover, sync_state FROM pg_replication_slots WHERE
slot_name = 'lsub1_slot';}
+ ),
+ "t|r",
+ 'logical slot has sync_state as ready and failover as true on standby');

It might be better if the message has the same order as the SQL. Eg.
"failover as true and sync_state as ready".

~~~

82.
+# Verify slot properties on the primary
+is( $primary->safe_psql('postgres',
+ q{SELECT failover, sync_state FROM pg_replication_slots WHERE
slot_name = 'lsub1_slot';}
+ ),
+ "t|n",
+ 'logical slot has sync_state as none and failover as true on primary');
+

It might be better if the message has the same order as the SQL. Eg.
"failover as true and sync_state as none".

~~~

83.
+# Test to confirm that restart_lsn of the logical slot on the primary
is synced to the standby

IMO the major test parts (like this one) may need more highlighting "#
---------------------" so those comments don't get lost among all the
other comments.

~~~

84.
+# let the slots get synced on the standby
+sleep 2;

Won't this make the test prone to failure on slow machines? Is there
not a more deterministic way to wait for the sync?

======
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2023-12-07 08:22:01 Re: Wrong results with grouping sets
Previous Message Drouvot, Bertrand 2023-12-07 07:53:36 Re: Synchronizing slots from primary to standby