Re: Synchronizing slots from primary to standby

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2024-01-19 06:18:20
Message-ID: CAHut+Pt5Pk_xJkb54oahR+f9oawgfnmbpewvkZPgnRhoJ3gkYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some review comments for patch v63-0003.

======
Commit Message

1.
This patch attempts to start slot-sync worker as a special process
which is neither a bgworker nor an Auxiliary process. The benefit
we get here is we can control the start-conditions of the worker which
further allows us to 'enable_syncslot' as PGC_SIGHUP which was otherwise
a PGC_POSTMASTER GUC when slotsync worker was registered as bgworker.

~

missing word?

/allows us to/allows us to define/

======
src/backend/postmaster/postmaster.c

2. process_pm_child_exit

+ /*
+ * Was it the slot sync worker? Normal exit or FATAL exit (FATAL can
+ * be caused by libpqwalreceiver on receiving shutdown request by the
+ * startup process during promotion) can be ignored; we'll start a new
+ * one at the next iteration of the postmaster's main loop, if
+ * necessary. Any other exit condition is treated as a crash.
+ */
+ if (pid == SlotSyncWorkerPID)
+ {
+ SlotSyncWorkerPID = 0;
+ if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
+ HandleChildCrash(pid, exitstatus,
+ _("Slotsync worker process"));
+ continue;
+ }

2a.

I think the 2nd sentence is easier to read if written like:

Normal exit or FATAL exit can be ignored (FATAL can be caused by
libpqwalreceiver on receiving shutdown request by the startup process
during promotion);

~

2b.
All other names nearby are lowercase so maybe change "Slotsync worker
process" to ""slotsync worker process" or ""slot sync worker process".

======
src/backend/replication/logical/slotsync.c

3. check_primary_info

if (!valid)
- ereport(ERROR,
+ {
+ *primary_slot_invalid = true;
+ ereport(LOG,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("exiting from slot synchronization due to bad configuration"),
+ errmsg("skipping slot synchronization due to bad configuration"),
/* translator: second %s is a GUC variable name */
errdetail("The primary server slot \"%s\" specified by %s is not valid.",
PrimarySlotName, "primary_slot_name"));
+ }

Somehow it seems more appropriate for the *caller* to decide what to
do (e.g. "skipping...") when the primary slot is invalid. See also
the next review comment #4b -- maybe just change this LOG to say "bad
configuration for slot synchronization".

~~~

4.
/*
* Check that all necessary GUCs for slot synchronization are set
- * appropriately. If not, raise an ERROR.
+ * appropriately. If not, log the message and pass 'valid' as false
+ * to the caller.
*
* If all checks pass, extracts the dbname from the primary_conninfo GUC and
* returns it.
*/
static char *
-validate_parameters_and_get_dbname(void)
+validate_parameters_and_get_dbname(bool *valid)

4a.
This feels back-to-front. I think a "validate" function should return
boolean. It can return the dbname as a side-effect only when it is
valid.

SUGGESTION
static boolean
validate_parameters_and_get_dbname(char *dbname)

~

4b.
It was a bit different when there were ERRORs but now they are LOGs
somehow it seems wrong for this function to say what the *caller* will
do. Maybe you can rewrite all the errmsg so the don't say "skipping"
but they just say "bad configuration for slot synchronization"

If valid is false then you can LOG "skipping" at the caller...

~~~

5. wait_for_valid_params_and_get_dbname

+ dbname = validate_parameters_and_get_dbname(&valid);
+ if (valid)
+ break;
+ else

This code will be simpler when the function is change to return
boolean as suggested above in #4a.

Also the 'else' is unnecessary.

SUGGESTION
if (validate_parameters_and_get_dbname(&dbname)
break;
~

6.
+ if (rc & WL_LATCH_SET)
+ ResetLatch(MyLatch);
+
+ }
+ }

Unnecessary blank line.

~~~

7. slotsync_reread_config

+ if (old_enable_syncslot != enable_syncslot)
+ {
+ /*
+ * We have reached here, so old value must be true and new must be
+ * false.
+ */
+ Assert(old_enable_syncslot);
+ Assert(!enable_syncslot);

I felt it would be better just to say Assert(enable_syncslot); at the
top of this function (before the ProcessConfigFile). Then none of this
other comment/assert if really needed because it should be
self-evident.

~~~

8. StartSlotSyncWorker

int
StartSlotSyncWorker(void)
{
pid_t pid;

#ifdef EXEC_BACKEND
switch ((pid = slotsyncworker_forkexec()))
#else
switch ((pid = fork_process()))
#endif
{
case -1:
ereport(LOG,
(errmsg("could not fork slot sync worker process: %m")));
return 0;

#ifndef EXEC_BACKEND
case 0:
/* in postmaster child ... */
InitPostmasterChild();

/* Close the postmaster's sockets */
ClosePostmasterPorts(false);

ReplSlotSyncWorkerMain(0, NULL);
break;
#endif
default:
return (int) pid;
}

/* shouldn't get here */
return 0;
}

The switch code can be rearranged so you don't need the #ifndef

SUGGESTION
#ifdef EXEC_BACKEND
switch ((pid = slotsyncworker_forkexec()))
{
#else
switch ((pid = fork_process()))
{
case 0:
/* in postmaster child ... */
InitPostmasterChild();

/* Close the postmaster's sockets */
ClosePostmasterPorts(false);

ReplSlotSyncWorkerMain(0, NULL);
break;
#endif
case -1:
ereport(LOG,
(errmsg("could not fork slot sync worker process: %m")));
return 0;
default:
return (int) pid;
}

======
src/backend/storage/lmgr/proc.c

9. InitProcess

* this; it probably should.)
+ *
+ * Slot sync worker does not participate in it, see comments atop Backend.
*/
- if (IsUnderPostmaster && !IsAutoVacuumLauncherProcess())
+ if (IsUnderPostmaster && !IsAutoVacuumLauncherProcess() &&
+ !IsLogicalSlotSyncWorker())
MarkPostmasterChildActive();

9a.
/does not participate in it/also does not participate in it/

~

9b.
It's not clear where "atop Backend" is referring to.

~~~

10.
* way, so tell the postmaster we've cleaned up acceptably well. (XXX
* autovac launcher should be included here someday)
*/
- if (IsUnderPostmaster && !IsAutoVacuumLauncherProcess())
+ if (IsUnderPostmaster && !IsAutoVacuumLauncherProcess() &&
+ !IsLogicalSlotSyncWorker())
MarkPostmasterChildInactive();

Should this comment also be updated to mention slot sync worker?

======
src/backend/utils/activity/pgstat_io.c

11. pgstat_tracks_io_bktype

case B_WAL_SENDER:
+ case B_SLOTSYNC_WORKER:
return true;
}

Notice all the other enums were arrange in alphabetical order, so do
the same here.

======
src/backend/utils/init/miscinit.c

12. GetBackendTypeDesc

+ case B_SLOTSYNC_WORKER:
+ backendDesc = "slotsyncworker";
+ break;
}
All the other case are in alphabetical order, same as the enum values,
so do the same here.

~~~

13. InitializeSessionUserIdStandalone

* This function should only be called in single-user mode, in autovacuum
* workers, and in background workers.
*/
- Assert(!IsUnderPostmaster || IsAutoVacuumWorkerProcess() ||
IsBackgroundWorker);
+ Assert(!IsUnderPostmaster || IsAutoVacuumWorkerProcess() ||
+ IsLogicalSlotSyncWorker() || IsBackgroundWorker);

Looks like this Assert has a stale comment that should be updated.

======
src/include/miscadmin.h

14. GetBackendTypeDesc

B_WAL_SUMMARIZER,
+ B_SLOTSYNC_WORKER,
B_WAL_WRITER,
} BackendType;

It seems strange to jam this new value among the other B_WAL enums.
Anyway, it looks like everything else is in alphabetical order, so we
do that too.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-01-19 06:20:00 Re: System username in pg_stat_activity
Previous Message Andy Fan 2024-01-19 06:17:13 Re: the s_lock_stuck on perform_spin_delay