RE: Synchronizing slots from primary to standby

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: RE: Synchronizing slots from primary to standby
Date: 2023-11-21 04:31:58
Message-ID: OS0PR01MB5716783B2FDBF8487DB82EDE94BBA@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, November 17, 2023 7:39 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Nov 16, 2023 at 5:34 PM shveta malik <shveta(dot)malik(at)gmail(dot)com>
> wrote:
> >
> > PFA v35.
> >
>
> Review v35-0002*
> ==============

Thanks for the comments.

> 1.
> As quoted in the commit message,
> >
> If a logical slot is invalidated on the primary, slot on the standby is also
> invalidated. If a logical slot on the primary is valid but is invalidated on the
> standby due to conflict (say required rows removed on the primary), then that
> slot is dropped and recreated on the standby in next sync-cycle.
> It is okay to recreate such slots as long as these are not consumable on the
> standby (which is the case currently).
> >
>
> I think this won't happen normally because of the physical slot and
> hot_standby_feedback but probably can occur in cases like if the user
> temporarily switches hot_standby_feedback from on to off. Are there any other
> reasons? I think we can mention the cases along with it as well at least for now.
> Additionally, I think this should be covered in code comments as well.

I will collect all these cases and update in next version.

>
> 2.
> #include "postgres.h"
> -
> +#include "access/genam.h"
>
> Spurious line removal.

Removed.

>
> 3.
> A password needs to be provided too, if the sender demands
> password
> authentication. It can be provided in the
> <varname>primary_conninfo</varname> string, or in a separate
> - <filename>~/.pgpass</filename> file on the standby server (use
> - <literal>replication</literal> as the database name).
> - Do not specify a database name in the
> - <varname>primary_conninfo</varname> string.
> + <filename>~/.pgpass</filename> file on the standby server.
> + </para>
> + <para>
> + Specify <literal>dbname</literal> in
> + <varname>primary_conninfo</varname> string to allow
> synchronization
> + of slots from the primary server to the standby server.
> + This will only be used for slot synchronization. It is ignored
> + for streaming.
>
> Is there a reason to remove part of the earlier sentence "use
> <literal>replication</literal> as the database name"?

Added it back.

>
> 4.
> + <primary><varname>enable_syncslot</varname> configuration
> parameter</primary>
> + </indexterm>
> + </term>
> + <listitem>
> + <para>
> + It enables a physical standby to synchronize logical failover slots
> + from the primary server so that logical subscribers are not blocked
> + after failover.
> + </para>
> + <para>
> + It is enabled by default. This parameter can only be set in the
> + <filename>postgresql.conf</filename> file or on the server
> command line.
> + </para>
>
> I think you forgot to update the documentation for the default value of this
> variable.

Updated.

>
> 5.
> + * a) start the logical replication workers for every enabled subscription
> + * when not in standby_mode
> + * b) start the slot-sync worker for logical failover slots synchronization
> + * from the primary server when in standby_mode.
>
> Either use a full stop after both lines or none of these.

Added a full stop.

>
> 6.
> +static void slotsync_worker_cleanup(SlotSyncWorkerInfo * worker);
>
> There shouldn't be space between * and the worker.

Removed, and added the type to typedefs.list.

>
> 7.
> + if (!SlotSyncWorker->hdr.in_use)
> + {
> + LWLockRelease(SlotSyncWorkerLock);
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("replication slot-sync worker not initialized, "
> + "cannot attach")));
> + }
> +
> + if (SlotSyncWorker->hdr.proc)
> + {
> + LWLockRelease(SlotSyncWorkerLock);
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("replication slot-sync worker is "
> + "already running, cannot attach")));
> + }
>
> Using slot-sync in the error messages looks a bit odd to me. Can we use
> "replication slot sync worker ..." in both these and other similar messages? I
> think it would be better if we don't split the messages into multiple lines in
> these cases as messages don't appear too long to me.

Changed as suggested.

>
> 8.
> +/*
> + * Detach the worker from DSM and update 'proc' and 'in_use'.
> + * Logical replication launcher will come to know using these
> + * that the worker has shutdown.
> + */
> +void
> +slotsync_worker_detach(int code, Datum arg) {
>
> I think the reference to DSM is leftover from the previous version of the patch.
> Can we change the above comments as per the new code?

Changed.

>
> 9.
> +static bool
> +slotsync_worker_launch()
> {
> ...
> + /* TODO: do we really need 'generation', analyse more here */
> + worker->hdr.generation++;
>
> We should do something about this TODO. As per my understanding, we don't
> need a generation number for the slot sync worker as we have one such worker
> but I guess the patch requires it because we are using existing logical
> replication worker infrastructure. This brings the question of whether we really
> need a separate SlotSyncWorkerInfo or if we can use existing
> LogicalRepWorker and distinguish it with LogicalRepWorkerType? I guess you
> didn't use it because most of the fields in LogicalRepWorker will be unused for
> slot sync worker.

Will think about this one and update in next version.

>
> 10.
> + * Can't use existing functions like 'get_database_oid' from
> +dbcommands.c for
> + * validity purpose as they need db connection.
> + */
> +static bool
> +validate_dbname(const char *dbname)
>
> I don't know how important it is to validate the dbname before launching the
> sync slot worker because anyway after launching, it will give an error while
> initializing the connection if the dbname is invalid. But, if we think it is really
> required, did you consider using GetDatabaseTuple()?

Yes, we could export GetDatabaseTuple. Apart from this, I am thinking is it possible to
release the restriction for the dbname. For example, slot sync worker could
always connect to the 'template1' as the worker doesn't update the
database objects. Although I didn't find some examples on server side, but some
client commands(e.g. pg_upgrade) will connect to template1 to check some global
objects. (Just FYI, the previous version patch used a replication command which
may avoid the dbname but was replaced with SELECT to improve the flexibility and
avoid introducing new command.)

Best Regards,
Hou zj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-11-21 04:37:07 Re: Annoying build warnings from latest Apple toolchain
Previous Message Zhijie Hou (Fujitsu) 2023-11-21 04:31:39 RE: Synchronizing slots from primary to standby