Re: Synchronizing slots from primary to standby

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-11-21 08:26:49
Message-ID: CAJpy0uC7-n+BqocDt=VwvfBJ3TuEpkUNd_RoiNT2YhgSpf3aiw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 21, 2023 at 10:02 AM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Friday, November 17, 2023 7:39 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Thu, Nov 16, 2023 at 5:34 PM shveta malik <shveta(dot)malik(at)gmail(dot)com>
> > wrote:
> > >
> > > PFA v35.
> > >
> >
> > Review v35-0002*
> > ==============
>
> Thanks for the comments.
>
> > 1.
> > As quoted in the commit message,
> > >
> > If a logical slot is invalidated on the primary, slot on the standby is also
> > invalidated. If a logical slot on the primary is valid but is invalidated on the
> > standby due to conflict (say required rows removed on the primary), then that
> > slot is dropped and recreated on the standby in next sync-cycle.
> > It is okay to recreate such slots as long as these are not consumable on the
> > standby (which is the case currently).
> > >
> >
> > I think this won't happen normally because of the physical slot and
> > hot_standby_feedback but probably can occur in cases like if the user
> > temporarily switches hot_standby_feedback from on to off. Are there any other
> > reasons? I think we can mention the cases along with it as well at least for now.
> > Additionally, I think this should be covered in code comments as well.
>
> I will collect all these cases and update in next version.
>
> >
> > 2.
> > #include "postgres.h"
> > -
> > +#include "access/genam.h"
> >
> > Spurious line removal.
>
> Removed.
>
> >
> > 3.
> > A password needs to be provided too, if the sender demands
> > password
> > authentication. It can be provided in the
> > <varname>primary_conninfo</varname> string, or in a separate
> > - <filename>~/.pgpass</filename> file on the standby server (use
> > - <literal>replication</literal> as the database name).
> > - Do not specify a database name in the
> > - <varname>primary_conninfo</varname> string.
> > + <filename>~/.pgpass</filename> file on the standby server.
> > + </para>
> > + <para>
> > + Specify <literal>dbname</literal> in
> > + <varname>primary_conninfo</varname> string to allow
> > synchronization
> > + of slots from the primary server to the standby server.
> > + This will only be used for slot synchronization. It is ignored
> > + for streaming.
> >
> > Is there a reason to remove part of the earlier sentence "use
> > <literal>replication</literal> as the database name"?
>
> Added it back.
>
> >
> > 4.
> > + <primary><varname>enable_syncslot</varname> configuration
> > parameter</primary>
> > + </indexterm>
> > + </term>
> > + <listitem>
> > + <para>
> > + It enables a physical standby to synchronize logical failover slots
> > + from the primary server so that logical subscribers are not blocked
> > + after failover.
> > + </para>
> > + <para>
> > + It is enabled by default. This parameter can only be set in the
> > + <filename>postgresql.conf</filename> file or on the server
> > command line.
> > + </para>
> >
> > I think you forgot to update the documentation for the default value of this
> > variable.
>
> Updated.
>
> >
> > 5.
> > + * a) start the logical replication workers for every enabled subscription
> > + * when not in standby_mode
> > + * b) start the slot-sync worker for logical failover slots synchronization
> > + * from the primary server when in standby_mode.
> >
> > Either use a full stop after both lines or none of these.
>
> Added a full stop.
>
> >
> > 6.
> > +static void slotsync_worker_cleanup(SlotSyncWorkerInfo * worker);
> >
> > There shouldn't be space between * and the worker.
>
> Removed, and added the type to typedefs.list.
>
> >
> > 7.
> > + if (!SlotSyncWorker->hdr.in_use)
> > + {
> > + LWLockRelease(SlotSyncWorkerLock);
> > + ereport(ERROR,
> > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > + errmsg("replication slot-sync worker not initialized, "
> > + "cannot attach")));
> > + }
> > +
> > + if (SlotSyncWorker->hdr.proc)
> > + {
> > + LWLockRelease(SlotSyncWorkerLock);
> > + ereport(ERROR,
> > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > + errmsg("replication slot-sync worker is "
> > + "already running, cannot attach")));
> > + }
> >
> > Using slot-sync in the error messages looks a bit odd to me. Can we use
> > "replication slot sync worker ..." in both these and other similar messages? I
> > think it would be better if we don't split the messages into multiple lines in
> > these cases as messages don't appear too long to me.
>
> Changed as suggested.
>
> >
> > 8.
> > +/*
> > + * Detach the worker from DSM and update 'proc' and 'in_use'.
> > + * Logical replication launcher will come to know using these
> > + * that the worker has shutdown.
> > + */
> > +void
> > +slotsync_worker_detach(int code, Datum arg) {
> >
> > I think the reference to DSM is leftover from the previous version of the patch.
> > Can we change the above comments as per the new code?
>
> Changed.
>
> >
> > 9.
> > +static bool
> > +slotsync_worker_launch()
> > {
> > ...
> > + /* TODO: do we really need 'generation', analyse more here */
> > + worker->hdr.generation++;
> >
> > We should do something about this TODO. As per my understanding, we don't
> > need a generation number for the slot sync worker as we have one such worker
> > but I guess the patch requires it because we are using existing logical
> > replication worker infrastructure.

Yes, we do not need generation, but since we want to use existing
logical-rep worker infrastructure, we can retain generation but can
keep it as zero always for the slot-sync worker case.

> > This brings the question of whether we really
> > need a separate SlotSyncWorkerInfo or if we can use existing
> > LogicalRepWorker and distinguish it with LogicalRepWorkerType? I guess you
> > didn't use it because most of the fields in LogicalRepWorker will be unused for
> > slot sync worker.

Yes, right. If we use LogicalRepWorker in the slot-sync worker, then
it will be a task to keep a check (even in future) that no-one should
end up using uninitialized fields in slot-sync code. That is why
shifting common fields to LogicalWorkerHeader and using that in
SlotSyncWorkerInfo and LogicalRepWorker seems a better approach to me.

>
> Will think about this one and update in next version.
>
> >
> > 10.
> > + * Can't use existing functions like 'get_database_oid' from
> > +dbcommands.c for
> > + * validity purpose as they need db connection.
> > + */
> > +static bool
> > +validate_dbname(const char *dbname)
> >
> > I don't know how important it is to validate the dbname before launching the
> > sync slot worker because anyway after launching, it will give an error while
> > initializing the connection if the dbname is invalid. But, if we think it is really
> > required, did you consider using GetDatabaseTuple()?
>
> Yes, we could export GetDatabaseTuple. Apart from this, I am thinking is it possible to
> release the restriction for the dbname. For example, slot sync worker could
> always connect to the 'template1' as the worker doesn't update the
> database objects. Although I didn't find some examples on server side, but some
> client commands(e.g. pg_upgrade) will connect to template1 to check some global
> objects. (Just FYI, the previous version patch used a replication command which
> may avoid the dbname but was replaced with SELECT to improve the flexibility and
> avoid introducing new command.)
>
> Best Regards,
> Hou zj
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2023-11-21 08:32:19 Re: Synchronizing slots from primary to standby
Previous Message Drouvot, Bertrand 2023-11-21 07:43:16 Re: Synchronizing slots from primary to standby