Re: Synchronizing slots from primary to standby

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-11-17 11:38:36
Message-ID: CAA4eK1+P9R3GO2rwGBg2EOh=uYjWUSEOHD8yvs4Je8WYa2RHag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 16, 2023 at 5:34 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> PFA v35.
>

Review v35-0002*
==============
1.
As quoted in the commit message,
>
If a logical slot is invalidated on the primary, slot on the standby is also
invalidated. If a logical slot on the primary is valid but is invalidated
on the standby due to conflict (say required rows removed on the primary),
then that slot is dropped and recreated on the standby in next sync-cycle.
It is okay to recreate such slots as long as these are not consumable on the
standby (which is the case currently).
>

I think this won't happen normally because of the physical slot and
hot_standby_feedback but probably can occur in cases like if the user
temporarily switches hot_standby_feedback from on to off. Are there
any other reasons? I think we can mention the cases along with it as
well at least for now. Additionally, I think this should be covered in
code comments as well.

2.
#include "postgres.h"
-
+#include "access/genam.h"

Spurious line removal.

3.
A password needs to be provided too, if the sender demands password
authentication. It can be provided in the
<varname>primary_conninfo</varname> string, or in a separate
- <filename>~/.pgpass</filename> file on the standby server (use
- <literal>replication</literal> as the database name).
- Do not specify a database name in the
- <varname>primary_conninfo</varname> string.
+ <filename>~/.pgpass</filename> file on the standby server.
+ </para>
+ <para>
+ Specify <literal>dbname</literal> in
+ <varname>primary_conninfo</varname> string to allow synchronization
+ of slots from the primary server to the standby server.
+ This will only be used for slot synchronization. It is ignored
+ for streaming.

Is there a reason to remove part of the earlier sentence "use
<literal>replication</literal> as the database name"?

4.
+ <primary><varname>enable_syncslot</varname> configuration
parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ It enables a physical standby to synchronize logical failover slots
+ from the primary server so that logical subscribers are not blocked
+ after failover.
+ </para>
+ <para>
+ It is enabled by default. This parameter can only be set in the
+ <filename>postgresql.conf</filename> file or on the server
command line.
+ </para>

I think you forgot to update the documentation for the default value
of this variable.

5.
+ * a) start the logical replication workers for every enabled subscription
+ * when not in standby_mode
+ * b) start the slot-sync worker for logical failover slots synchronization
+ * from the primary server when in standby_mode.

Either use a full stop after both lines or none of these.

6.
+static void slotsync_worker_cleanup(SlotSyncWorkerInfo * worker);

There shouldn't be space between * and the worker.

7.
+ if (!SlotSyncWorker->hdr.in_use)
+ {
+ LWLockRelease(SlotSyncWorkerLock);
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("replication slot-sync worker not initialized, "
+ "cannot attach")));
+ }
+
+ if (SlotSyncWorker->hdr.proc)
+ {
+ LWLockRelease(SlotSyncWorkerLock);
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("replication slot-sync worker is "
+ "already running, cannot attach")));
+ }

Using slot-sync in the error messages looks a bit odd to me. Can we
use "replication slot sync worker ..." in both these and other
similar messages? I think it would be better if we don't split the
messages into multiple lines in these cases as messages don't appear
too long to me.

8.
+/*
+ * Detach the worker from DSM and update 'proc' and 'in_use'.
+ * Logical replication launcher will come to know using these
+ * that the worker has shutdown.
+ */
+void
+slotsync_worker_detach(int code, Datum arg)
+{

I think the reference to DSM is leftover from the previous version of
the patch. Can we change the above comments as per the new code?

9.
+static bool
+slotsync_worker_launch()
{
...
+ /* TODO: do we really need 'generation', analyse more here */
+ worker->hdr.generation++;

We should do something about this TODO. As per my understanding, we
don't need a generation number for the slot sync worker as we have one
such worker but I guess the patch requires it because we are using
existing logical replication worker infrastructure. This brings the
question of whether we really need a separate SlotSyncWorkerInfo or if
we can use existing LogicalRepWorker and distinguish it with
LogicalRepWorkerType? I guess you didn't use it because most of the
fields in LogicalRepWorker will be unused for slot sync worker.

10.
+ * Can't use existing functions like 'get_database_oid' from dbcommands.c for
+ * validity purpose as they need db connection.
+ */
+static bool
+validate_dbname(const char *dbname)

I don't know how important it is to validate the dbname before
launching the sync slot worker because anyway after launching, it will
give an error while initializing the connection if the dbname is
invalid. But, if we think it is really required, did you consider
using GetDatabaseTuple()?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2023-11-17 11:48:49 Re: Synchronizing slots from primary to standby
Previous Message Drouvot, Bertrand 2023-11-17 11:38:14 Re: Synchronizing slots from primary to standby