Re: Synchronizing slots from primary to standby

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2024-02-02 06:54:37
Message-ID: CAHut+PtRJ_4x3re0bPn791PTL6kc2TRm1A2EPY1kjTCax_9F=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some review comments for v750002.

(this is a WIP but this is what I found so far...)

======
doc/src/sgml/protocol.sgml

1.
> > 2. ALTER_REPLICATION_SLOT slot_name ( option [, ...] ) #
> >
> > <para>
> > - If true, the slot is enabled to be synced to the standbys.
> > + If true, the slot is enabled to be synced to the standbys
> > + so that logical replication can be resumed after failover.
> > </para>
> >
> > This also should have the sentence "The default is false.", e.g. the
> > same as the same option in CREATE_REPLICATION_SLOT says.
>
> I have not added this. I feel the default value related details should
> be present in the 'CREATE' part, it is not meaningful for the "ALTER"
> part. ALTER does not have any defaults, it just modifies the options
> given by the user.

You are correct. My mistake.

======
src/backend/postmaster/bgworker.c

2.
#include "replication/logicalworker.h"
+#include "replication/worker_internal.h"
#include "storage/dsm.h"

Is this change needed when the rest of the code is removed?

======
src/backend/replication/logical/slotsync.c

3. synchronize_one_slot

> > 3.
> > + /*
> > + * Make sure that concerned WAL is received and flushed before syncing
> > + * slot to target lsn received from the primary server.
> > + *
> > + * This check will never pass if on the primary server, user has
> > + * configured standby_slot_names GUC correctly, otherwise this can hit
> > + * frequently.
> > + */
> > + latestFlushPtr = GetStandbyFlushRecPtr(NULL);
> > + if (remote_slot->confirmed_lsn > latestFlushPtr)
> >
> > BEFORE
> > This check will never pass if on the primary server, user has
> > configured standby_slot_names GUC correctly, otherwise this can hit
> > frequently.
> >
> > SUGGESTION (simpler way to say the same thing?)
> > This will always be the case unless the standby_slot_names GUC is not
> > correctly configured on the primary server.
>
> It is not true. It will not hit this condition "always" but has higher
> chances to hit it when standby_slot_names is not configured. I think
> you meant 'unless the standby_slot_names GUC is correctly configured'.
> I feel the current comment gives clear info (less confusing) and thus
> I have not changed it for the time being. I can consider if I get more
> comments there.

Hmm. I meant what I wrote. The "This" of my suggested text refers to
the previous sentence in the comment (not about "hitting" ?? your
condition).

TBH, regardless of the wording you choose, I think it will be much
clearer to move the comment to be inside the if.

SUGGESTION
/*
* Make sure that concerned WAL is received and flushed before syncing
* slot to target lsn received from the primary server.
*/
latestFlushPtr = GetStandbyFlushRecPtr(NULL);
if (remote_slot->confirmed_lsn > latestFlushPtr)
{
/*
* Can get here only when if GUC 'standby_slot_names' on the primary
* server was not configured correctly.
*/
...
}

~~~

4.
+static bool
+validate_parameters_and_get_dbname(char **dbname)
+{
+ /*
+ * A physical replication slot(primary_slot_name) is required on the
+ * primary to ensure that the rows needed by the standby are not removed
+ * after restarting, so that the synchronized slot on the standby will not
+ * be invalidated.
+ */
+ if (PrimarySlotName == NULL || *PrimarySlotName == '\0')
+ {
+ ereport(LOG,
+ /* translator: %s is a GUC variable name */
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("\"%s\" must be defined.", "primary_slot_name"));
+ return false;
+ }
+
+ /*
+ * hot_standby_feedback must be enabled to cooperate with the physical
+ * replication slot, which allows informing the primary about the xmin and
+ * catalog_xmin values on the standby.
+ */
+ if (!hot_standby_feedback)
+ {
+ ereport(LOG,
+ /* translator: %s is a GUC variable name */
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("\"%s\" must be enabled.", "hot_standby_feedback"));
+ return false;
+ }
+
+ /*
+ * Logical decoding requires wal_level >= logical and we currently only
+ * synchronize logical slots.
+ */
+ if (wal_level < WAL_LEVEL_LOGICAL)
+ {
+ ereport(LOG,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("\"wal_level\" must be >= logical."));
+ return false;
+ }
+
+ /*
+ * The primary_conninfo is required to make connection to primary for
+ * getting slots information.
+ */
+ if (PrimaryConnInfo == NULL || *PrimaryConnInfo == '\0')
+ {
+ ereport(LOG,
+ /* translator: %s is a GUC variable name */
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("\"%s\" must be defined.", "primary_conninfo"));
+ return false;
+ }
+
+ /*
+ * The slot sync worker needs a database connection for walrcv_exec to
+ * work.
+ */
+ *dbname = walrcv_get_dbname_from_conninfo(PrimaryConnInfo);
+ if (*dbname == NULL)
+ {
+ ereport(LOG,
+
+ /*
+ * translator: 'dbname' is a specific option; %s is a GUC variable
+ * name
+ */
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("'dbname' must be specified in \"%s\".", "primary_conninfo"));
+ return false;
+ }
+
+ return true;
+}

I wonder if it is better to log all the problems in one go instead of
making users stumble onto them one at a time after fixing one and then
hitting the next problem. e.g. just set some variable "all_ok =
false;" each time instead of all the "return false;"

Then at the end of the function just "return all_ok;"

======
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-02-02 06:55:30 Re: Synchronizing slots from primary to standby
Previous Message Masahiko Sawada 2024-02-02 06:49:18 Re: Commitfest 2024-01 is now closed