Re: Synchronizing slots from primary to standby

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-12-20 06:31:38
Message-ID: CAHut+PvocO_bwwz7kD-4mLnFRCLOK3i0ocLyGDvLQKzkhzEjTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some comments for the patch v50-0002.

======
GENERAL

(I made a short study of all the ereports in this patch -- here are
some findings)

~~~

0.1 Don't need the parentheses.

Checking all the ereports I see that half of them have the redundant
parentheses and half of them do not; You might as well make them all
use the new style where the extra parentheses are not needed.

e.g.
+ ereport(LOG,
+ (errmsg("skipping slot synchronization"),
+ errdetail("enable_syncslot is disabled.")));

e.g.
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot drop replication slot \"%s\"", name),
+ errdetail("This slot is being synced from the primary server.")));

and many more like this. Search for all the ereports.

~~~

0.2
+ ereport(LOG,
+ (errmsg("dropped replication slot \"%s\" of dbid %d as it "
+ "was not sync-ready", NameStr(s->data.name),
+ s->data.database)));

I felt maybe that could be:

errmsg("dropped replication slot \"%s\" of dbid %d", ...
errdetail("It was not sync-ready.")

(now this shares the same errmsg with another ereport)

~~~

0.3.
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("skipping sync of slot \"%s\" as it is a user created"
+ " slot", remote_slot->name),
+ errdetail("This slot has failover enabled on the primary and"
+ " thus is sync candidate but user created slot with"
+ " the same name already exists on the standby.")));

This seemed too wordy. Can't it be shortened (maybe like below)
without losing any of the vital information?

errmsg("skipping sync of slot \"%s\"", ...)
errdetail("A user-created slot with the same name already exists on
the standby.")

~~~

0.4
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("exiting from slot synchronization due to bad configuration"),
+ /* translator: second %s is a GUC variable name */
+ errdetail("The primary slot \"%s\" specified by %s is not valid.",
+ PrimarySlotName, "primary_slot_name")));

/The primary slot/The primary server slot/

~~~

0.5
+ ereport(ERROR,
+ (errmsg("could not fetch primary_slot_name \"%s\" info from the "
+ "primary: %s", PrimarySlotName, res->err)));

/primary:/primary server:/

~~~

0.6
The continuations for long lines are inconsistent. Sometimes there are
trailing spaces and sometimes there are leading spaces. And sometimes
there are both at the same time which would cause double-spacing in
the message! Please make them all the same. I think using leading
spaces is easier but YMMV.

e.g.
+ elog(ERROR,
+ "not synchronizing local slot \"%s\" LSN(%X/%X)"
+ " to remote slot's LSN(%X/%X) as synchronization "
+ " would move it backwards", remote_slot->name,
+ LSN_FORMAT_ARGS(slot->data.restart_lsn),
+ LSN_FORMAT_ARGS(remote_slot->restart_lsn));

======
src/backend/replication/logical/slotsync.c

1. check_primary_info

+ /* No need to check further, return that we are cascading standby */
+ if (remote_in_recovery)
+ {
+ *am_cascading_standby = true;
+ ExecClearTuple(tupslot);
+ walrcv_clear_result(res);
+ CommitTransactionCommand();
+ return;
+ }
+
+ valid = DatumGetBool(slot_getattr(tupslot, 2, &isnull));
+ Assert(!isnull);
+
+ if (!valid)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("exiting from slot synchronization due to bad configuration"),
+ /* translator: second %s is a GUC variable name */
+ errdetail("The primary slot \"%s\" specified by %s is not valid.",
+ PrimarySlotName, "primary_slot_name")));
+ ExecClearTuple(tupslot);
+ walrcv_clear_result(res);
+ CommitTransactionCommand();
+}

Now that there is a common cleanup/return code this function be
reduced further like below:

SUGGESTION

if (remote_in_recovery)
{
/* No need to check further, return that we are cascading standby */
*am_cascading_standby = true;
}
else
{
/* We are a normal standby. */

valid = DatumGetBool(slot_getattr(tupslot, 2, &isnull));
Assert(!isnull);

if (!valid)
...
}

ExecClearTuple(tupslot);
walrcv_clear_result(res);
CommitTransactionCommand();
}

~~~

2. ReplSlotSyncWorkerMain

+ /*
+ * One can promote the standby and we can no longer be a cascading
+ * standby. So recheck here.
+ */
+ if (am_cascading_standby)
+ check_primary_info(wrconn, &am_cascading_standby);

Minor rewording of that new comment.

SUGGESTION
If the standby was promoted then what was previously a cascading
standby might no longer be one, so recheck each time.

======
src/test/recovery/t/050_verify_slot_order.pl

3.
+##################################################
+# Test that a synchronized slot can not be decoded, altered and
dropped by the user
+##################################################

/and dropped/or dropped/

~~~

4.
+
+($result, $stdout, $stderr) = $standby1->psql(
+ 'postgres',
+ qq[ALTER_REPLICATION_SLOT lsub1_slot (failover);],
+ replication => 'database');
+ok($stderr =~ /ERROR: cannot alter replication slot "lsub1_slot"/,
+ "synced slot on standby cannot be altered");
+

Add a comment for this test part

SUGGESTION
Attempting to alter a synced slot should result in an error

~~~

5.
IMO it would be better if the tests were done in the same order
mentioned in the comment. So either change the tests or change the
comment.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2023-12-20 06:48:22 Re: Change GUC hashtable to use simplehash?
Previous Message shveta malik 2023-12-20 06:01:04 Function to get invalidation cause of a replication slot.