Re: Synchronizing slots from primary to standby

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2024-01-10 06:26:08
Message-ID: CAFiTN-sREB=NMmK-9xD-jpVh1xCtnPOwRoi_J0Crrb9ebdsnZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 9, 2024 at 5:44 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
comments on 0002

1.
+/* Worker's nap time in case of regular activity on the primary server */
+#define WORKER_DEFAULT_NAPTIME_MS 10L /* 10 ms */
+
+/* Worker's nap time in case of no-activity on the primary server */
+#define WORKER_INACTIVITY_NAPTIME_MS 10000L /* 10 sec */

Instead of directly switching between 10ms to 10s shouldn't we
increase the nap time gradually? I mean it can go beyond 10 sec as
well but instead of directly switching
from 10ms to 10 sec we can increase it every time with some multiplier
and keep a max limit up to which it can grow. Although we can reset
back to 10ms directly as soon as
we observe some activity.

2.
SlotSyncWorkerCtxStruct add this to typedefs. list file

3.
+/*
+ * Update local slot metadata as per remote_slot's positions
+ */
+static void
+local_slot_update(RemoteSlot *remote_slot)
+{
+ Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE);
+
+ LogicalConfirmReceivedLocation(remote_slot->confirmed_lsn);
+ LogicalIncreaseXminForSlot(remote_slot->confirmed_lsn,
+ remote_slot->catalog_xmin);
+ LogicalIncreaseRestartDecodingForSlot(remote_slot->confirmed_lsn,
+ remote_slot->restart_lsn);
+}

IIUC on the standby we just want to overwrite what we get from primary
no? If so why we are using those APIs that are meant for the actual
decoding slots where it needs to take certain logical decisions
instead of mere overwriting?

4.
+/*
+ * Helper function for drop_obsolete_slots()
+ *
+ * Drops synced slot identified by the passed in name.
+ */
+static void
+drop_synced_slots_internal(const char *name, bool nowait)

Suggestion to add one line to explain no wait in the header

5.
+/*
+ * Helper function to check if local_slot is present in remote_slots list.
+ *
+ * It also checks if logical slot is locally invalidated i.e. invalidated on
+ * the standby but valid on the primary server. If found so, it sets
+ * locally_invalidated to true.
+ */

Instead of saying "but valid on the primary server" better to mention
it in the remote_slots list, because here this function is just
checking the remote_slots list regardless of whether the list came
from. Mentioning primary
seems like it might fetch directly from the primary in this function
so this is a bit confusing.

6.
+/*
+ * Check that all necessary GUCs for slot synchronization are set
+ * appropriately. If not, raise an ERROR.
+ */
+static void
+validate_slotsync_parameters(char **dbname)

The function name just says 'validate_slotsync_parameters' but it also
gets the dbname so I think it better we change the name accordingly
also instead of passing dbname as a parameter just return it directly.
There
is no need to pass this extra parameter and make the function return void.

7.
+ tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
+ tuple_ok = tuplestore_gettupleslot(res->tuplestore, true, false, tupslot);
+ Assert(tuple_ok); /* It must return one tuple */

Comments say 'It must return one tuple' but asserting just for at
least one tuple shouldn't we enhance assert so that it checks that we
got exactly one tuple?

8.
/* No need to check further, return that we are cascading standby */
+ *am_cascading_standby = true;

we are not returning immediately we are just setting
am_cascading_standby to true so adjust comments accordingly

9.
+ /* No need to check further, return that we are cascading standby */
+ *am_cascading_standby = true;
+ }
+ else
+ {
+ /* We are a normal standby. */

Single-line comments do not follow the uniform pattern for the full
stop, either use a full stop for all single-line comments or none, at
least follow the same rule in a file or nearby comments.

10.
+ errmsg("exiting from slot synchronization due to bad configuration"),
+ errhint("%s must be defined.", "primary_slot_name"));

Why we are using the constant string "primary_slot_name" as a variable
in this error formatting?

11.
+ /*
+ * Hot_standby_feedback must be enabled to cooperate with the physical
+ * replication slot, which allows informing the primary about the xmin and
+ * catalog_xmin values on the standby.

I do not like capitalizing the first letter of the
'hot_standby_feedback' which is a GUC parameter

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrei Lepikhov 2024-01-10 06:29:30 Re: Custom explain options
Previous Message Tom Lane 2024-01-10 06:25:36 Re: Add BF member koel-like indentation checks to SanityCheck CI