Re: Using WaitEventSet in the postmaster

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Using WaitEventSet in the postmaster
Date: 2023-01-06 22:08:36
Message-ID: CA+hUKGKt58vOw8Y32Cy_Lnj8K2FGrty6-XO+bmkRnyuz5w-wJw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 23, 2022 at 8:46 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> I pushed the small preliminary patches. Here's a rebase of the main patch.

Here are some questions I have considered. Anyone got an opinion
on point 3, in particular?

1. Is it OK that we are now using APIs that might throw, in places
where we weren't? I think so: we don't really expect WaitEventSet
APIs to throw unless something is pretty seriously wrong, and if you
hack things to inject failures there then you get a FATAL error and
the postmaster exits and the children detect that. I think that is
appropriate.

2. Is it really OK to delete the pqsignal_pm() infrastructure? I
think so. The need for sa_mask to block all signals is gone: all
signal handlers should now be re-entrant (ie safe in case of
interruption while already in a signal handler), safe against stack
overflow (pqsignal() still blocks re-entry for the *same* signal
number, because we use sigaction() without SA_NODEFER, so a handler
can only be interrupted by a different signal, and the number of
actions installed is finite and small), and safe to run at any time
(ie safe to interrupt the user context because we just do known-good
sigatomic_t/syscall stuff and save/restore errno). The concern about
SA_RESTART is gone, because we no longer use the underspecified
select() interface; the replacement implementation syscalls, even
poll(), return with EINTR for handlers installed with SA_RESTART, but
that's now moot anyway because we have a latch that guarantees they
return with a different event anyway. (FTR select() is nearly extinct
in BE code, I found one other user and I plan to remove it, see RADIUS
thread, CF #4103.)

3. Is it OK to clobber the shared pending flag for SIGQUIT, SIGTERM,
SIGINT? If you send all of these extremely rapidly, it's
indeterminate which one will be seen by handle_shutdown_request(). I
think that's probably acceptable? To be strict about processing only
the first one that is delivered, I think you'd need an sa_mask to
block all three signals, and then you wouldn't change
pending_shutdown_request if it's already set, which I'm willing to
code up if someone thinks that's important. (<vapourware>Ideally I
would invent WL_SIGNAL to consume signal events serially without
handlers or global variables</vapourware>.)

4. Is anything new leaking into child processes due to this new
infrastructure? I don't think so; the postmaster's MemoryContext is
destroyed, and before that I'm releasing kernel resources on OSes that
need it (namely Linux, where the epoll fd and signalfd need to be
closed).

5. Is the signal mask being correctly handled during forking? I
think so: I decided to push the masking logic directly into the
routine that forks, to make it easy to verify that all paths set the
mask the way we want. (While thinking about that I noticed that
signals don't seem to be initially masked on Windows; I think that's a
pre-existing condition, and I assume we get away with it because
nothing reaches the fake signal dispatch code soon enough to break
anything? Not changing that in this patch.)

6. Is the naming and style OK? Better ideas welcome, but basically I
tried to avoid all unnecessary refactoring and changes, so no real
logic moves around, and the changes are pretty close to "mechanical".
One bikeshed decision was what to call the {handle,process}_XXX
functions and associated flags. Maybe "action" isn't the best name;
but it could be a request from pg_ctl or a request from a child
process. I went with newly invented names for these handlers rather
than "handle_SIGUSR1" etc because (1) the 3 different shutdown request
signals point to a common handler and (2) I hope to switch to latches
instead of SIGUSR1 for "action" in later work. But I could switch to
got_SIGUSR1 style variables if someone thinks it's better.

Here's a new version, with small changes:
* remove a stray reference to select() in a pqcomm.c comment
* move PG_SETMASK(&UnBlockSig) below the bit that sets up SIGTTIN etc
* pgindent

Attachment Content-Type Size
v8-0001-Give-the-postmaster-a-WaitEventSet-and-a-latch.patch application/x-patch 23.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dag Lem 2023-01-06 22:27:28 Re: daitch_mokotoff module
Previous Message Bruce Momjian 2023-01-06 21:56:11 Re: GROUP BY ALL