Re: Synchronizing slots from primary to standby

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-11-28 10:02:53
Message-ID: CAJpy0uDUAbY_gK-bOvoEGBA9KqqU_EDqQ0jAQGk2o1_EWSZtbw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 17, 2023 at 5:08 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Nov 16, 2023 at 5:34 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > PFA v35.
> >
>
> Review v35-0002*
> ==============
> 1.
> As quoted in the commit message,
> >
> If a logical slot is invalidated on the primary, slot on the standby is also
> invalidated. If a logical slot on the primary is valid but is invalidated
> on the standby due to conflict (say required rows removed on the primary),
> then that slot is dropped and recreated on the standby in next sync-cycle.
> It is okay to recreate such slots as long as these are not consumable on the
> standby (which is the case currently).
> >
>
> I think this won't happen normally because of the physical slot and
> hot_standby_feedback but probably can occur in cases like if the user
> temporarily switches hot_standby_feedback from on to off. Are there
> any other reasons? I think we can mention the cases along with it as
> well at least for now. Additionally, I think this should be covered in
> code comments as well.
>
> 2.
> #include "postgres.h"
> -
> +#include "access/genam.h"
>
> Spurious line removal.
>
> 3.
> A password needs to be provided too, if the sender demands password
> authentication. It can be provided in the
> <varname>primary_conninfo</varname> string, or in a separate
> - <filename>~/.pgpass</filename> file on the standby server (use
> - <literal>replication</literal> as the database name).
> - Do not specify a database name in the
> - <varname>primary_conninfo</varname> string.
> + <filename>~/.pgpass</filename> file on the standby server.
> + </para>
> + <para>
> + Specify <literal>dbname</literal> in
> + <varname>primary_conninfo</varname> string to allow synchronization
> + of slots from the primary server to the standby server.
> + This will only be used for slot synchronization. It is ignored
> + for streaming.
>
> Is there a reason to remove part of the earlier sentence "use
> <literal>replication</literal> as the database name"?
>
> 4.
> + <primary><varname>enable_syncslot</varname> configuration
> parameter</primary>
> + </indexterm>
> + </term>
> + <listitem>
> + <para>
> + It enables a physical standby to synchronize logical failover slots
> + from the primary server so that logical subscribers are not blocked
> + after failover.
> + </para>
> + <para>
> + It is enabled by default. This parameter can only be set in the
> + <filename>postgresql.conf</filename> file or on the server
> command line.
> + </para>
>
> I think you forgot to update the documentation for the default value
> of this variable.
>
> 5.
> + * a) start the logical replication workers for every enabled subscription
> + * when not in standby_mode
> + * b) start the slot-sync worker for logical failover slots synchronization
> + * from the primary server when in standby_mode.
>
> Either use a full stop after both lines or none of these.
>
> 6.
> +static void slotsync_worker_cleanup(SlotSyncWorkerInfo * worker);
>
> There shouldn't be space between * and the worker.
>
> 7.
> + if (!SlotSyncWorker->hdr.in_use)
> + {
> + LWLockRelease(SlotSyncWorkerLock);
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("replication slot-sync worker not initialized, "
> + "cannot attach")));
> + }
> +
> + if (SlotSyncWorker->hdr.proc)
> + {
> + LWLockRelease(SlotSyncWorkerLock);
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("replication slot-sync worker is "
> + "already running, cannot attach")));
> + }
>
> Using slot-sync in the error messages looks a bit odd to me. Can we
> use "replication slot sync worker ..." in both these and other
> similar messages? I think it would be better if we don't split the
> messages into multiple lines in these cases as messages don't appear
> too long to me.
>
> 8.
> +/*
> + * Detach the worker from DSM and update 'proc' and 'in_use'.
> + * Logical replication launcher will come to know using these
> + * that the worker has shutdown.
> + */
> +void
> +slotsync_worker_detach(int code, Datum arg)
> +{
>
> I think the reference to DSM is leftover from the previous version of
> the patch. Can we change the above comments as per the new code?
>
> 9.
> +static bool
> +slotsync_worker_launch()
> {
> ...
> + /* TODO: do we really need 'generation', analyse more here */
> + worker->hdr.generation++;
>
> We should do something about this TODO. As per my understanding, we
> don't need a generation number for the slot sync worker as we have one
> such worker but I guess the patch requires it because we are using
> existing logical replication worker infrastructure. This brings the
> question of whether we really need a separate SlotSyncWorkerInfo or if
> we can use existing LogicalRepWorker and distinguish it with
> LogicalRepWorkerType? I guess you didn't use it because most of the
> fields in LogicalRepWorker will be unused for slot sync worker.
>
> 10.
> + * Can't use existing functions like 'get_database_oid' from dbcommands.c for
> + * validity purpose as they need db connection.
> + */
> +static bool
> +validate_dbname(const char *dbname)
>
> I don't know how important it is to validate the dbname before
> launching the sync slot worker because anyway after launching, it will
> give an error while initializing the connection if the dbname is
> invalid. But, if we think it is really required, did you consider
> using GetDatabaseTuple()?

I have removed 'validate_dbname' in v40. We let dbname go through
BackgroundWorkerInitializeConnection() which internally does dbname
validation. Later if 'primary_conninfo' is changed and the db name
specified in it is different, we exit the worker and let it get
restarted which will do the validation again when it does
BackgroundWorkerInitializeConnection().

>
> --
> With Regards,
> Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2023-11-28 10:04:38 RE: [PoC] pg_upgrade: allow to upgrade publisher node
Previous Message Dean Rasheed 2023-11-28 10:00:44 Re: [PATCH] psql: Add tab-complete for optional view parameters