Re: Refactoring backend fork+exec code

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andres Freund <andres(at)anarazel(dot)de>, Tristan Partin <tristan(at)neon(dot)tech>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Refactoring backend fork+exec code
Date: 2023-12-08 12:33:33
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 30/11/2023 22:26, Andres Freund wrote:
> On 2023-11-30 01:36:25 +0200, Heikki Linnakangas wrote:
>> [...]
>> 33 files changed, 1787 insertions(+), 2002 deletions(-)
> Well, that's not small...
> I think it may be worth splitting some of the file renaming out into a
> separate commit, makes it harder to see what change here.

Here you are (details at end of this email)

>> + [PMC_AV_LAUNCHER] = {"autovacuum launcher", AutoVacLauncherMain, true},
>> + [PMC_AV_WORKER] = {"autovacuum worker", AutoVacWorkerMain, true},
>> + [PMC_BGWORKER] = {"bgworker", BackgroundWorkerMain, true},
>> + [PMC_SYSLOGGER] = {"syslogger", SysLoggerMain, false},
>> +
>> + [PMC_STARTUP] = {"startup", StartupProcessMain, true},
>> + [PMC_BGWRITER] = {"bgwriter", BackgroundWriterMain, true},
>> + [PMC_ARCHIVER] = {"archiver", PgArchiverMain, true},
>> + [PMC_CHECKPOINTER] = {"checkpointer", CheckpointerMain, true},
>> + [PMC_WAL_WRITER] = {"wal_writer", WalWriterMain, true},
>> + [PMC_WAL_RECEIVER] = {"wal_receiver", WalReceiverMain, true},
>> +};
> It feels like we have too many different ways of documenting the type of a
> process. This new PMC_ stuff, enum AuxProcType, enum BackendType.

Agreed. And "am_walsender" and such variables.

> Which then leads to code like this:
>> -CheckpointerMain(void)
>> +CheckpointerMain(char *startup_data, size_t startup_data_len)
>> {
>> sigjmp_buf local_sigjmp_buf;
>> MemoryContext checkpointer_context;
>> + Assert(startup_data_len == 0);
>> +
>> + MyAuxProcType = CheckpointerProcess;
>> + MyBackendType = B_CHECKPOINTER;
>> + AuxiliaryProcessInit();
>> +
> For each type of child process. That seems a bit too redundant. Can't we
> unify this at least somewhat? Can't we just reuse BackendType here? Sure,
> there'd be pointless entry for B_INVALID, but that doesn't seem like a
> problem, could even be useful, by pointing it to a function raising an error.

There are a few differences: B_INVALID (and B_STANDALONE_BACKEND) are
pointless for this array as you noted. But also, we don't know if the
backend is a regular backend or WAL sender until authentication, so for
a WAL sender, we'd need to change MyBackendType from B_BACKEND to
B_WAL_SENDER after forking. Maybe that's ok.

I didn't do anything about this yet, but I'll give it some more thought.

>> + if (strncmp(argv[1], "--forkchild=", 12) != 0)
>> + elog(FATAL, "invalid subpostmaster invocation (--forkchild argument missing)");
>> + entry_name = argv[1] + 12;
>> + found = false;
>> + for (int idx = 0; idx < lengthof(entry_kinds); idx++)
>> + {
>> + if (strcmp(entry_kinds[idx].name, entry_name) == 0)
>> + {
>> + child_type = idx;
>> + found = true;
>> + break;
>> + }
>> + }
>> + if (!found)
>> + elog(ERROR, "unknown child kind %s", entry_name);
> If we then have to search linearly, why don't we just pass the index into the
> array?

We could. I like the idea of a human-readable name on the command line,
although I'm not sure if it's really visible anywhere.

>> +void
>> +BackendMain(char *startup_data, size_t startup_data_len)
>> +{
> Is there any remaining reason for this to live in postmaster.c? Given that
> other backend types don't, that seems oddly assymmetrical.

Gee, another yak to shave, thanks ;-). You're right, that makes a lot of
sense. I added another patch that moves that to a new file,
src/backend/tcop/backend_startup.c. ProcessStartupPacket() and friends
go there too. It might make sense to do this before the other patches,
but it's the last patch in the series now.

I kept processCancelRequest() in postmaster.c because it looks at
BackendList/ShmemBackendArray, which are static in postmaster.c. Some
more refactoring might be in order there, perhaps moving those to a
different file too. But that can be done separately, this split is
pretty OK as is.

On 30/11/2023 20:44, Tristan Partin wrote:
>> From 8886db1ed6bae21bf6d77c9bb1230edbb55e24f9 Mon Sep 17 00:00:00 2001
>> From: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
>> Date: Thu, 30 Nov 2023 00:04:22 +0200
>> Subject: [PATCH v3 4/7] Pass CAC as argument to backend process
> For me, being new to the code, it would be nice to have more of an
> explanation as to why this is "better." I don't doubt it; it would just
> help me and future readers of this commit in the future. More of an
> explanation in the commit message would suffice.

Updated the commit message. It's mainly to pave the way for the next
patches, which move the initialization of Port to the backend process,
after forking. And that in turn paves the way for the patches after
that. But also, very subjectively, it feels more natural to me.

> My other comment on this commit is that we now seem to have lost the
> context on what CAC stands for. Before we had the member variable to
> explain it. A comment on the enum would be great or changing cac named
> variables to canAcceptConnections. I did notice in patch 7 that there
> are still some variables named canAcceptConnections around, so I'll
> leave this comment up to you.

Good point. The last patch in this series - which is new compared to
previous patch version - moves CAC_state to a different header file
again. I added a comment there.

>> + if (fwrite(param, paramsz, 1, fp) != 1)
>> + {
>> + ereport(LOG,
>> + (errcode_for_file_access(),
>> + errmsg("could not write to file \"%s\": %m", tmpfilename)));
>> + FreeFile(fp);
>> + return -1;
>> + }
>> +
>> + /* Release file */
>> + if (FreeFile(fp))
>> + {
>> + ereport(LOG,
>> + (errcode_for_file_access(),
>> + errmsg("could not write to file \"%s\": %m", tmpfilename)));
>> + return -1;
>> + }
> Two pieces of feedback here. I generally find write(2) more useful than
> fwrite(3) because write(2) will report a useful errno, whereas fwrite(2)
> just uses ferror(3). The additional errno information might be valuable
> context in the log message. Up to you if you think it is also valuable.

In general I agree. This patch just moves existing code though, so I
left it as is.

> The log message if FreeFile() fails doesn't seem to make sense to me.
> I didn't see any file writing in that code path, but it is possible that
> I missed something.

FreeFile() calls fclose(), which flushes the buffer. If fclose() fails,
it's most likely because the write() to flush the buffer failed, so
"could not write" is usually appropriate. (It feels ugly to me too,
error handling with the buffered i/o functions is a bit messy. As you
said, plain open()/write() is more clear.)

>> + /*
>> + * Need to reinitialize the SSL library in the backend, since the context
>> + * structures contain function pointers and cannot be passed through the
>> + * parameter file.
>> + *
>> + * If for some reason reload fails (maybe the user installed broken key
>> + * files), soldier on without SSL; that's better than all connections
>> + * becoming impossible.
>> + *
>> + * XXX should we do this in all child processes? For the moment it's
>> + * enough to do it in backend children.
>> + */
>> +#ifdef USE_SSL
>> + if (EnableSSL)
>> + {
>> + if (secure_initialize(false) == 0)
>> + LoadedSSL = true;
>> + else
>> + ereport(LOG,
>> + (errmsg("SSL configuration could not be loaded in child process")));
>> + }
>> +#endif
> Do other child process types do any non-local communication?

No. Although in theory an extension-defined background worker could do
whatever, including opening TLS connections. It's not clear if such a
background worker would want the same initialization that we do in
secure_initialize(), or something else.

Here is a new patch set:

> v5-0001-Pass-CAC-as-argument-to-backend-process.patch
> v5-0002-Remove-ConnCreate-and-ConnFree-and-allocate-Port-.patch
> v5-0003-Move-initialization-of-Port-struct-to-child-proce.patch

These patches form a pretty well-contained unit. The gist is to move the
initialization of the Port struct to after forking the backend process
(in patch 3).

I plan to polish and commit these next, so any final reviews on these
are welcome.

> v5-0004-Extract-registration-of-Win32-deadchild-callback-.patch
> v5-0005-Move-some-functions-from-postmaster.c-to-new-sour.patch
> v5-0006-Refactor-AuxProcess-startup.patch
> v5-0007-Refactor-postmaster-child-process-launching.patch

Patches 4-6 are refactorings that don't do much good on their own, but
they help to make patch 7 much smaller and easier to review.

I left out some of the code-moving that I had in previous patch versions:

- Previously I moved fork_process() function from fork_process.c to the
new launch_backend.c file. That might still make sense, there is nothing
else in fork_process.c and the only caller is in launch_backend.c. But
I'm not sure, and it can be done separately.

- Previously I moved InitPostmasterChild from miscinit.c to the new
launch_backend.c file. That might also still make sense, but I'm not
100% sure it's an improvement, and it can be done later if we want to.

> v5-0008-Move-code-for-backend-startup-to-separate-file.patch

This moves BackendMain() and friends from postmaster.c to a new file,
per Andres's suggestion.

Heikki Linnakangas
Neon (

Attachment Content-Type Size
v5-0001-Pass-CAC-as-argument-to-backend-process.patch text/x-patch 5.6 KB
v5-0002-Remove-ConnCreate-and-ConnFree-and-allocate-Port-.patch text/x-patch 3.9 KB
v5-0003-Move-initialization-of-Port-struct-to-child-proce.patch text/x-patch 27.4 KB
v5-0004-Extract-registration-of-Win32-deadchild-callback-.patch text/x-patch 3.2 KB
v5-0005-Move-some-functions-from-postmaster.c-to-new-sour.patch text/x-patch 49.7 KB
v5-0006-Refactor-AuxProcess-startup.patch text/x-patch 8.7 KB
v5-0007-Refactor-postmaster-child-process-launching.patch text/x-patch 70.2 KB
v5-0008-Move-code-for-backend-startup-to-separate-file.patch text/x-patch 57.7 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2023-12-08 12:36:05 Re: Remove some unnecessary includes of "access/xlog_internal.h"
Previous Message Zhijie Hou (Fujitsu) 2023-12-08 12:05:50 RE: Synchronizing slots from primary to standby