Re: Using WaitEventSet in the postmaster

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Using WaitEventSet in the postmaster
Date: 2022-12-06 11:58:06
Message-ID: CA+hUKGLU6t-17dywkMADMBdPHhfvnHaS2JzWbu2Fd80220hrpw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 6, 2022 at 7:09 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2022-12-05 22:45:57 +1300, Thomas Munro wrote:
> > The reason for the existing sleep-based approach was that we didn't
> > want to accept any more connections (or spin furiously because the
> > listen queue was non-empty). So in this version I invented a way to
> > suppress socket events temporarily with WL_SOCKET_IGNORE, and then
> > reactivate them after crash reinit.
>
> That seems like an odd special flag. Why do we need it? Is it just because we
> want to have assertions ensuring that something is being queried?

Yeah. Perhaps 0 would be a less clumsy way to say "no events please".
I removed the assertions and did it that way in this next iteration.

I realised that the previous approach didn't actually suppress POLLHUP
and POLLERR in the poll and epoll implementations (even though our
code seems to think it needs to ask for those events, it's not
necessary, you get them anyway), and, being level-triggered, if those
were ever reported we'd finish up pegging the CPU to 100% until the
children exited. Unlikely to happen with a server socket, but wrong
on principle, and maybe a problem for other potential users of this
temporary event suppression mode.

One way to fix that for the epoll version is to EPOLL_CTL_DEL and
EPOLL_CTL_ADD, whenever transitioning to/from a zero event mask.
Tried like that in this version. Another approach would be to
(finally) write DeleteWaitEvent() to do the same thing at a higher
level... seems overkill for this.

The kqueue version was already doing that because of the way it was
implemented, and the poll and Windows versions needed only a small
adjustment. I'm not too sure about the Windows change; my two ideas
are passing the 0 through as shown in this version (not sure if it
really works the way I want, but it makes some sense and the
WSAEventSelect() call doesn't fail...), or sticking a dummy unsignaled
event in the array passed to WaitForMultipleObjects().

To make sure this code is exercised, I made the state machine code
eager about silencing the socket events during PM_WAIT_DEAD_END, so
crash TAP tests go through the cycle. Regular non-crash shutdown also
runs EPOLL_CTL_DEL/EV_DELETE, which stands out if you trace the
postmaster.

> > * WL_SOCKET_ACCEPT is a new event for an incoming connection (on Unix,
> > this is just another name for WL_SOCKET_READABLE, but Window has a
> > different underlying event; this mirrors WL_SOCKET_CONNECTED on the
> > other end of a connection)
>
> Perhaps worth committing separately and soon? Seems pretty uncontroversial
> from here.

Alright, I split this into a separate patch.

> > +/*
> > + * Object representing the state of a postmaster.
> > + *
> > + * XXX Lots of global variables could move in here.
> > + */
> > +typedef struct
> > +{
> > + WaitEventSet *wes;
> > +} Postmaster;
> > +
>
> Seems weird to introduce this but then basically have it be unused. I'd say
> either have a preceding patch move at least a few members into it, or just
> omit it for now.

Alright, I'll just have to make a global variable wait_set for now to
keep things simple.

> > + /* This may configure SIGURG, depending on platform. */
> > + InitializeLatchSupport();
> > + InitLocalLatch();
>
> I'm mildly preferring InitProcessLocalLatch(), but not sure why - there's not
> really a conflicting meaning of "local" here.

Done.

> > +/*
> > + * Initialize the WaitEventSet we'll use in our main event loop.
> > + */
> > +static void
> > +InitializeWaitSet(Postmaster *postmaster)
> > +{
> > + /* Set up a WaitEventSet for our latch and listening sockets. */
> > + postmaster->wes = CreateWaitEventSet(CurrentMemoryContext, 1 + MAXLISTEN);
> > + AddWaitEventToSet(postmaster->wes, WL_LATCH_SET, PGINVALID_SOCKET, MyLatch, NULL);
> > + for (int i = 0; i < MAXLISTEN; i++)
> > + {
> > + int fd = ListenSocket[i];
> > +
> > + if (fd == PGINVALID_SOCKET)
> > + break;
> > + AddWaitEventToSet(postmaster->wes, WL_SOCKET_ACCEPT, fd, NULL, NULL);
> > + }
> > +}
>
> The naming seems like it could be confused with latch.h
> infrastructure. InitPostmasterWaitSet()?

OK.

> > +/*
> > + * Activate or deactivate the server socket events.
> > + */
> > +static void
> > +AdjustServerSocketEvents(Postmaster *postmaster, bool active)
> > +{
> > + for (int pos = 1; pos < GetNumRegisteredWaitEvents(postmaster->wes); ++pos)
> > + ModifyWaitEvent(postmaster->wes,
> > + pos, active ? WL_SOCKET_ACCEPT : WL_SOCKET_IGNORE,
> > + NULL);
> > +}
>
> This seems to hardcode the specific wait events we're waiting for based on
> latch.c infrastructure. Not really convinced that's a good idea.

What are you objecting to? The assumption that the first socket is at
position 1? The use of GetNumRegisteredWaitEvents()?

> > + /* Process work scheduled by signal handlers. */
> > + if (pending_action_request)
> > + process_action_request(postmaster);
> > + if (pending_child_exit)
> > + process_child_exit(postmaster);
> > + if (pending_reload_request)
> > + process_reload_request();
> > + if (pending_shutdown_request)
> > + process_shutdown_request(postmaster);

> Is the order of operations here quite right? Shouldn't we process a shutdown
> request before the others? And a child exit before the request to start an
> autovac worker etc?
>
> ISTM it should be 1) shutdown request 2) child exit 3) config reload 4) action
> request.

OK, reordered like that.

> > - ereport(DEBUG2,
> > - (errmsg_internal("postmaster received signal %d",
> > - postgres_signal_arg)));

> Hm, not having the "postmaster received signal" message anymore seems like a
> loss when debugging things. I think process_shutdown_request() should emit
> something like it.

I added some of these.

> I wonder if we should have a elog_sighand() that's written to be signal
> safe. I've written versions of that numerous times for debugging, and it's a
> bit silly to do that over and over again.

Right, I was being dogmatic about kicking everything that doesn't have
a great big neon "async-signal-safe" sign on it out of the handlers.

> > +
> > + /* start accepting server socket connection events again */
> > + reenable_server_socket_events = true;
> > }
> > }
>
> I don't think reenable_server_socket_events does anything as the patch stands
> - I don't see it being checked anywhere? And in the path above, you're using
> AdjustServerSocketEvents() directly.

Sorry, that was a left over unused variable from an earlier attempt,
which I only noticed after clicking send. Removed.

> > @@ -4094,6 +4130,7 @@ BackendStartup(Port *port)
> > /* Hasn't asked to be notified about any bgworkers yet */
> > bn->bgworker_notify = false;
> >
> > + PG_SETMASK(&BlockSig);
> > #ifdef EXEC_BACKEND
> > pid = backend_forkexec(port);
> > #else /* !EXEC_BACKEND */
>
> There are other calls to fork_process() - why don't they need the same
> treatment?
>
> Perhaps we should add an assertion to fork_process() ensuring that signals are
> masked?

If we're going to put an assertion in there, we might as well consider
setting and restoring the mask in that wrapper. Tried like that in
this version.

> > diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
> > index eb3a569aae..3bfef592eb 100644
> > --- a/src/backend/storage/ipc/latch.c
> > +++ b/src/backend/storage/ipc/latch.c
> > @@ -283,6 +283,17 @@ InitializeLatchSupport(void)
> > #ifdef WAIT_USE_SIGNALFD
> > sigset_t signalfd_mask;
> >
> > + if (IsUnderPostmaster)
> > + {
> > + if (signal_fd != -1)
> > + {
> > + /* Release postmaster's signal FD; ignore any error */
> > + (void) close(signal_fd);
> > + signal_fd = -1;
> > + ReleaseExternalFD();
> > + }
> > + }
> > +
>
> Hm - arguably it's a bug that we don't do this right now, correct?

Yes, I would say it's a non-live bug. A signalfd descriptor inherited
by a child process isn't dangerous (it doesn't see the parent's
signals, it sees the child's signals), but it's a waste because we'd
leak it. I guess we could re-use it instead but that seems a little
weird. I've put this into a separate commit in case someone wants to
argue for back-patching, but it's a pretty hypothetical concern since
the postmaster never initialised latch support before...

One thing that does seem a bit odd to me, though, is why we're
cleaning up inherited descriptors in a function called
InitializeLatchSupport(). I wonder if we should move it into
FreeLatchSupportAfterFork()?

We should also close the postmaster's epoll fd, so I invented
FreeWaitEventSetAfterFork(). I found that ClosePostmasterPorts() was
a good place to call that, though it doesn't really fit the name of
that function too well...

> > @@ -1201,6 +1214,7 @@ WaitEventAdjustKqueue(WaitEventSet *set, WaitEvent *event, int old_events)
> > event->events == WL_POSTMASTER_DEATH ||
> > (event->events & (WL_SOCKET_READABLE |
> > WL_SOCKET_WRITEABLE |
> > + WL_SOCKET_IGNORE |
> > WL_SOCKET_CLOSED)));
> >
> > if (event->events == WL_POSTMASTER_DEATH)
> > @@ -1312,6 +1326,8 @@ WaitEventAdjustWin32(WaitEventSet *set, WaitEvent *event)
> > flags |= FD_WRITE;
> > if (event->events & WL_SOCKET_CONNECTED)
> > flags |= FD_CONNECT;
> > + if (event->events & WL_SOCKET_ACCEPT)
> > + flags |= FD_ACCEPT;
> >
> > if (*handle == WSA_INVALID_EVENT)
> > {
>
> I wonder if the code would end up easier to understand if we handled
> WL_SOCKET_CONNECTED, WL_SOCKET_ACCEPT explicitly in the !WIN32 cases, rather
> than redefining it to WL_SOCKET_READABLE.

Yeah maybe we could try that separately.

> > diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
> > index 3082093d1e..655e881688 100644
> > --- a/src/backend/tcop/postgres.c
> > +++ b/src/backend/tcop/postgres.c
> > @@ -24,7 +24,6 @@
> > #include <signal.h>
> > #include <unistd.h>
> > #include <sys/resource.h>
> > -#include <sys/select.h>
> > #include <sys/socket.h>
> > #include <sys/time.h>
>
> Do you know why this include even existed?

That turned out to be a fun question to answer: apparently there used
to be an optional 'multiplexed backend' mode, removed by commit
d5bbe2aca5 in 1998. A single backend could be connected to multiple
frontends.

Attachment Content-Type Size
v4-0001-Add-WL_SOCKET_ACCEPT-event-to-WaitEventSet-API.patch text/x-patch 3.5 KB
v4-0002-Don-t-leak-a-signalfd-when-using-latches-in-the-p.patch text/x-patch 1.5 KB
v4-0003-Allow-parent-s-WaitEventSets-to-be-freed-after-fo.patch text/x-patch 2.1 KB
v4-0004-Allow-socket-WaitEvents-to-be-temporarily-blocked.patch text/x-patch 3.8 KB
v4-0005-Give-the-postmaster-a-WaitEventSet-and-a-latch.patch text/x-patch 22.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Takamichi Osumi (Fujitsu) 2022-12-06 12:13:57 RE: Time delayed LR (WAS Re: logical replication restrictions)
Previous Message Bharath Rupireddy 2022-12-06 11:53:50 Re: Question regarding "Make archiver process an auxiliary process. commit"