Re: Refactoring backend fork+exec code

From: "Tristan Partin" <tristan(at)neon(dot)tech>
To: "Heikki Linnakangas" <hlinnaka(at)iki(dot)fi>, "Andres Freund" <andres(at)anarazel(dot)de>
Cc: "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>, "Thomas Munro" <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Refactoring backend fork+exec code
Date: 2023-11-30 18:44:33
Message-ID: CXCDA9YFDZS1.Y5JUZBHLLXYR@neon.tech
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed Nov 29, 2023 at 5:36 PM CST, Heikki Linnakangas wrote:
> On 11/10/2023 14:12, Heikki Linnakangas wrote:
> Here's another rebased patch set. Compared to previous version, I did a
> little more refactoring around CreateSharedMemoryAndSemaphores and
> InitProcess:
>
> - patch 1 splits CreateSharedMemoryAndSemaphores into two functions:
> CreateSharedMemoryAndSemaphores is now only called at postmaster
> startup, and a new function called AttachSharedMemoryStructs() is called
> in backends in EXEC_BACKEND mode. I extracted the common parts of those
> functions to a new static function. (Some of this refactoring used to be
> part of the 3rd patch in the series, but it seems useful on its own, so
> I split it out.)
>
> - patch 3 moves the call to AttachSharedMemoryStructs() to
> InitProcess(), reducing the boilerplate code a little.
>
>
> The patches are incrementally useful, so if you don't have time to
> review all of them, a review on a subset would be useful too.

> From 8886db1ed6bae21bf6d77c9bb1230edbb55e24f9 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
> Date: Thu, 30 Nov 2023 00:04:22 +0200
> Subject: [PATCH v3 4/7] Pass CAC as argument to backend process

For me, being new to the code, it would be nice to have more of an
explanation as to why this is "better." I don't doubt it; it would just
help me and future readers of this commit in the future. More of an
explanation in the commit message would suffice.

My other comment on this commit is that we now seem to have lost the
context on what CAC stands for. Before we had the member variable to
explain it. A comment on the enum would be great or changing cac named
variables to canAcceptConnections. I did notice in patch 7 that there
are still some variables named canAcceptConnections around, so I'll
leave this comment up to you.

> From 98f8397b32a0b36e221475b32697c9c5bbca86a0 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
> Date: Wed, 11 Oct 2023 13:38:06 +0300
> Subject: [PATCH v3 5/7] Remove ConnCreate and ConnFree, and allocate Port in
> stack

I like it separate.

> From 79aab42705a8cb0e16e61c33052fc56fdd4fca76 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
> Date: Wed, 11 Oct 2023 13:38:10 +0300
> Subject: [PATCH v3 6/7] Introduce ClientSocket, rename some funcs

> +static int BackendStartup(ClientSocket *port);

s/port/client_sock

> - port->remote_hostname = strdup(remote_host);
> + port->remote_hostname = pstrdup(remote_host);
> + MemoryContextSwitchTo(oldcontext);

Something funky with the whitespace here, but my eyes might also be
playing tricks on me. Mixing spaces in tabs like what do in this
codebase makes it difficult to review :).

> From ce51876f87f1e4317e25baf64184749448fcd033 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
> Date: Thu, 30 Nov 2023 00:07:34 +0200
> Subject: [PATCH v3 7/7] Refactor postmaster child process launching

> + entry_kinds[child_type].main_fn(startup_data, startup_data_len);
> + Assert(false);

Seems like you want the pg_unreachable() macro here instead of
Assert(false). Similar comment at the end of SubPostmasterMain().

> + if (fwrite(param, paramsz, 1, fp) != 1)
> + {
> + ereport(LOG,
> + (errcode_for_file_access(),
> + errmsg("could not write to file \"%s\": %m", tmpfilename)));
> + FreeFile(fp);
> + return -1;
> + }
> +
> + /* Release file */
> + if (FreeFile(fp))
> + {
> + ereport(LOG,
> + (errcode_for_file_access(),
> + errmsg("could not write to file \"%s\": %m", tmpfilename)));
> + return -1;
> + }

Two pieces of feedback here. I generally find write(2) more useful than
fwrite(3) because write(2) will report a useful errno, whereas fwrite(2)
just uses ferror(3). The additional errno information might be valuable
context in the log message. Up to you if you think it is also valuable.

The log message if FreeFile() fails doesn't seem to make sense to me.
I didn't see any file writing in that code path, but it is possible that
I missed something.

> + /*
> + * Set reference point for stack-depth checking. This might seem
> + * redundant in !EXEC_BACKEND builds; but it's not because the postmaster
> + * launches its children from signal handlers, so we might be running on
> + * an alternative stack. XXX still true?
> + */
> + (void) set_stack_base();

Looks like there is still this XXX left. Can't say I completely
understand the second sentence either.

> + /*
> + * make sure stderr is in binary mode before anything can possibly be
> + * written to it, in case it's actually the syslogger pipe, so the pipe
> + * chunking protocol isn't disturbed. Non-logpipe data gets translated on
> + * redirection (e.g. via pg_ctl -l) anyway.
> + */

Nit: The 'm' in the first "make" should be capitalized.

> + if (fread(&param, sizeof(param), 1, fp) != 1)
> + {
> + write_stderr("could not read from backend variables file \"%s\": %s\n",
> + id, strerror(errno));
> + exit(1);
> + }
> +
> + /* read startup data */
> + *startup_data_len = param.startup_data_len;
> + if (param.startup_data_len > 0)
> + {
> + *startup_data = palloc(*startup_data_len);
> + if (fread(*startup_data, *startup_data_len, 1, fp) != 1)
> + {
> + write_stderr("could not read startup data from backend variables file \"%s\": %s\n",
> + id, strerror(errno));
> + exit(1);
> + }
> + }

fread(3) doesn't set errno. I would probably switch these to read(2) for
the reason I wrote in a previous comment.

> + /*
> + * Need to reinitialize the SSL library in the backend, since the context
> + * structures contain function pointers and cannot be passed through the
> + * parameter file.
> + *
> + * If for some reason reload fails (maybe the user installed broken key
> + * files), soldier on without SSL; that's better than all connections
> + * becoming impossible.
> + *
> + * XXX should we do this in all child processes? For the moment it's
> + * enough to do it in backend children.
> + */
> +#ifdef USE_SSL
> + if (EnableSSL)
> + {
> + if (secure_initialize(false) == 0)
> + LoadedSSL = true;
> + else
> + ereport(LOG,
> + (errmsg("SSL configuration could not be loaded in child process")));
> + }
> +#endif

Do other child process types do any non-local communication?

> -typedef struct ClientSocket {
> +struct ClientSocket
> +{
> pgsocket sock; /* File descriptor */
> SockAddr laddr; /* local addr (postmaster) */
> SockAddr raddr; /* remote addr (client) */
> -} ClientSocket;
> +};
> +typedef struct ClientSocket ClientSocket;

Can't say I completely understand the reason for this change given it
was added in your series.

I didn't look too hard at the Windows-specific code, so maybe someone
who knows Windows will have something to say, but it also might've just
been copy-paste that I didn't realize.

There were a few more XXXs that probably should be figured out before
committing. Though perhaps some of them were already there.

Patches 1-3 seem committable as-is. I only had minor comments on
everything but 7, so after taking a look at those, they could be
committed.

Overall, this seems liked a marked improvement :).

--
Tristan Partin
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Krishnakumar R 2023-11-30 18:54:12 Add missing error codes to PANIC/FATAL error reports in xlogrecovery
Previous Message Matthias van de Meent 2023-11-30 17:47:39 Re: Parallel CREATE INDEX for BRIN indexes