Re: Synchronizing slots from primary to standby

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-12-18 10:52:28
Message-ID: CAA4eK1Ko-EBBDkea2R8V8PeveGg10PBswCF7JQdnRu+MJP+YBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 15, 2023 at 11:03 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> Sorry, I missed attaching the patch. PFA v48.
>

Few comments on v48_0002
========================
1.
+static void
+slotsync_reread_config(WalReceiverConn *wrconn)
{
...
+ pfree(old_primary_conninfo);
+ pfree(old_primary_slotname);
+
+ if (restart)
+ {
+ char *msg = "slot sync worker will restart because of a parameter change";
+
+ /*
+ * The exit code 1 will make postmaster restart the slot sync worker.
+ */
+ slotsync_worker_exit(msg, 1 /* proc_exit code */ );
+ }
...

I don't see the need to explicitly pfree in case we are already
exiting the process because anyway the memory will be released. We can
avoid using the 'restart' variable for this. Also, probably, directly
exiting here makes sense and at another place where this function is
used. I see that in maybe_reread_subscription(), we exit with a 0 code
and still apply worker restarts, so why use a different exit code
here?

2.
+static void
+check_primary_info(WalReceiverConn *wrconn, bool *am_cascading_standby)
{
...
+ remote_in_recovery = DatumGetBool(slot_getattr(tupslot, 1, &isnull));
+ Assert(!isnull);
+
+ /* No need to check further, return that we are cascading standby */
+ if (remote_in_recovery)
+ {
+ *am_cascading_standby = true;
+ CommitTransactionCommand();
+ return;
...
}

Don't we need to clear the result and tuple in case of early return?

3. It would be a good idea to mention about requirements like a
physical slot on primary, hot_standby_feedback, etc. in the commit
message.

4.
+static bool
+wait_for_primary_slot_catchup(WalReceiverConn *wrconn, RemoteSlot *remote_slot,
+ bool *wait_attempts_exceeded)
{
...
+ tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
+ if (!tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
+ {
+ ereport(WARNING,
+ (errmsg("slot \"%s\" creation aborted", remote_slot->name),
+ errdetail("This slot was not found on the primary server")));
...
+ /*
+ * It is possible to get null values for LSN and Xmin if slot is
+ * invalidated on the primary server, so handle accordingly.
+ */
+ new_invalidated = DatumGetBool(slot_getattr(tupslot, 1, &isnull));
+ Assert(!isnull);
+
+ new_restart_lsn = DatumGetLSN(slot_getattr(tupslot, 2, &isnull));
+ if (new_invalidated || isnull)
+ {
+ ereport(WARNING,
+ (errmsg("slot \"%s\" creation aborted", remote_slot->name),
+ errdetail("This slot was invalidated on the primary server")));
...
}

a. The errdetail message should end with a full stop. Please check all
other errdetail messages in the patch to follow the same guideline.
b. I think saying slot creation aborted is not completely correct
because we would have created the slot especially when it is in 'i'
state. Can we change it to something like: "aborting initial sync for
slot \"%s\""?
c. Also, if the remote_slot is invalidated, ideally, we can even drop
the local slot but it seems that the patch will drop the same before
the next-sync cycle with any other slot that needs to be dropped. If
so, can we add the comment to indicate the same?

5.
+static void
+local_slot_update(RemoteSlot *remote_slot)
+{
+ Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE);
+
+ LogicalConfirmReceivedLocation(remote_slot->confirmed_lsn);
+ LogicalIncreaseXminForSlot(remote_slot->confirmed_lsn,
+ remote_slot->catalog_xmin);
+ LogicalIncreaseRestartDecodingForSlot(remote_slot->confirmed_lsn,
+ remote_slot->restart_lsn);
+
+ SpinLockAcquire(&MyReplicationSlot->mutex);
+ MyReplicationSlot->data.invalidated = remote_slot->invalidated;
+ SpinLockRelease(&MyReplicationSlot->mutex);
...
...

If required, the invalidated flag is updated in the caller as well, so
why do we need to update it here as well?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Euler Taveira 2023-12-18 11:48:09 Re: Add a perl function in Cluster.pm to generate WAL
Previous Message vignesh C 2023-12-18 10:49:33 Re: Remove MSVC scripts from the tree