Re: Redesigning postmaster death handling

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Redesigning postmaster death handling
Date: 2025-08-21 23:38:33
Message-ID: CA+hUKGJF28COMEPwD-mZEqvjh=6B07SS7YvJcaiX0TqOd4btGg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 21, 2025 at 5:45 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> One other thought here: do we *really* want such a critical-and-hard-
> to-test aspect of our behavior to be handled completely differently
> on different platforms? I'd lean to ignoring the Linux/FreeBSD
> facilities, because otherwise we're basically doubling our testing
> problems in exchange for not much.

Yeah. That attraction is that it's extremely simple and reliable:
set-and-forget, adding one line that sends you into well tested
immediate shutdown code. Combined with the fact that most of our user
base has it, that seemed attractive. The reliability aspects I was
thinking of are: (1) the kernel's knowledge of the process tree is
infallible by definition, (2) it's handled asynchronously on
postmaster exit, not after a POLLHUP, EVFILT_PROCESS, or process
HANDLE event that must be consumed synchronously by at least one
child.

For (2), in practice I think it's close to 100% certain that one
backend will currently or very soon be in WaitEventSetWait() and thus
drive the cleanup operation, and I think it's probably good enough.
For example, even if your backends are all busy, there's basically
always a bunch of "launchers" and other auxiliary processes ready and
waiting to deal with it. But it's possible to dream up extreme
theoretical scenarios where that bet fails: imagine if every single
backend except for one is current waiting for a lock in sem_wait()
(let's say it's the same lock for simplicity). I previously said in
some throwaway comment that they can't all be blocked in sem_wait() or
you already have a deadlock (a programming bug that isn't this
system's fault), but if the postmaster AND the backend that holds the
lock are killed by the OOM killer, you lose. Those backends would
need to be cleaned up manually by an administrator in all released
versions of PostgreSQL, and it's be not better with the v1 patch on
Windows and macOS. They'd all eat SIGQUIT on a Linux or FreeBSD
system with the v1 patch, so paper at least it's more hole-proof.

I agree that it would be nice to have just one system though, and of
course to make it completely reliable everywhere without complicated
theories.

One argument I thought of against PROC_PDEATHSIG_CTL is that its
simplicity also takes away some possibilities. Yesterday I wrote
"taking over the role of the departed Postmaster", and realised it's
not the whole enchilada: do we also want the "issuing SIGKILL to
recalcitrant children" bit? I don't want this system to be
complicated, rather the opposite, but I wonder if there is a nice way
to make it run *literally* the same code as the postmaster. We'd need
bulletproof data structure sharing, or preferably, no sharing of
modifiable data at all. Some ideas I'm looking into: better use of
process groups, or maybe doing the book keeping in memory that is not
even mapped into children until they need it. Or something.
Researching...

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2025-08-21 23:55:19 Re: memory leak in logical WAL sender with pgoutput's cachectx
Previous Message Corey Huinker 2025-08-21 23:27:23 Re: vacuumdb --missing-stats-only and permission issue