Re: Synchronizing slots from primary to standby

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-12-19 13:19:03
Message-ID: CAA4eK1J5zTmm4NE4os59WgU4AZPNb74X-n67pY8SkoDfzsN_jA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 19, 2023 at 5:17 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Mon, Dec 18, 2023 at 4:22 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, Dec 15, 2023 at 11:03 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > >
> > > Sorry, I missed attaching the patch. PFA v48.
> > >
> >
> > Few comments on v48_0002
> > ========================
>
> Thanks for reviewing. These are addressed in v50.
>

I was still reviewing the v48 version and have a few comments as
below. If some of these are already addressed or not relevant, feel
free to ignore them.
1.
+ /*
+ * Slot sync worker can be stopped at any time.
+ * Use exit status 1 so the background worker is restarted.

We don't need to start the second line of comment in a separate line.

2.
+ * The assumption is that these dropped local invalidated slots will get
+ * recreated in next sync-cycle and it is okay to drop and recreate such slots

In the above line '.. local invalidated ..' sounds redundant. Shall we
remove it?

3.
+ if (remote_slot->confirmed_lsn > WalRcv->latestWalEnd)
+ {
+ SpinLockRelease(&WalRcv->mutex);
+ elog(ERROR, "skipping sync of slot \"%s\" as the received slot sync "

This error message looks odd to me. At least, it should be exiting
instead of skipping because we won't continue after this.

4.
+ /* User created slot with the same name exists, raise ERROR. */
+ if (sync_state == SYNCSLOT_STATE_NONE)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("skipping sync of slot \"%s\" as it is a user created"
+ " slot", remote_slot->name),
+ errdetail("This slot has failover enabled on the primary and"
+ " thus is sync candidate but user created slot with"
+ " the same name already exists on the standby")));

Same problem as above. The skipping in error message doesn't seem to
be suitable for the purpose. Additionally, errdetail message should
end with a full stop.

5.
+ /* Slot ready for sync, so sync it. */
+ if (sync_state == SYNCSLOT_STATE_READY)
+ {
+ /*
+ * Sanity check: With hot_standby_feedback enabled and
+ * invalidations handled appropriately as above, this should never
+ * happen.
+ */
+ if (remote_slot->restart_lsn < slot->data.restart_lsn)
+ elog(ERROR,
+ "not synchronizing local slot \"%s\" LSN(%X/%X)"
+ " to remote slot's LSN(%X/%X) as synchronization "

The start of the error message sounds odd. Shall we say 'cannot
synchronize ...'?

6. All except one of the callers of local_slot_update() marks the slot
dirty and the same is required as well. I think the remaining caller
should also mark it dirty and we should move
ReplicationSlotMarkDirty() in the caller space.

7.
+synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot)
{
...
+ /*
+ * Copy the invalidation cause from remote only if local slot is not
+ * invalidated locally, we don't want to overwrite existing one.
+ */
+ if (slot->data.invalidated == RS_INVAL_NONE)
+ {
+ SpinLockAcquire(&slot->mutex);
+ slot->data.invalidated = remote_slot->invalidated;
+ SpinLockRelease(&slot->mutex);
+ }
...

It doesn't seem that after changing the invalidated flag, we always
mark the slot dirty. Am, I missing something?

8.
+ /*
+ * Drop local slots that no longer need to be synced. Do it before
+ * synchronize_one_slot to allow dropping of slots before actual sync
+ * which are invalidated locally while still valid on the primary server.
+ */
+ drop_obsolete_slots(remote_slot_list);

The second part of the above comment seems redundant as that is obvious.

9.
+static WalReceiverConn *
+remote_connect(void)
+{
+ WalReceiverConn *wrconn = NULL;
+ char *err;
+
+ wrconn = walrcv_connect(PrimaryConnInfo, true, false,
+ cluster_name[0] ? cluster_name : "slotsyncworker",
+ &err);
+ if (wrconn == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_CONNECTION_FAILURE),
+ errmsg("could not connect to the primary server: %s", err)));
+ return wrconn;
+}

Do we need a function for this? It appears to be called from just one
place, so not sure if it is helpful to have a function for this.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Karina Litskevich 2023-12-19 13:48:14 Correct the documentation of extension building infrastructure
Previous Message Hayato Kuroda (Fujitsu) 2023-12-19 13:05:17 RE: Synchronizing slots from primary to standby