Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robert(dot)haas(at)enterprisedb(dot)com>
Subject: Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module
Date: 2020-10-23 04:06:50
Message-ID: CAGRY4nxsAe_1k_9g5b47orA0S011iBoHsXHFMH7cg7HV0O1bwQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 7, 2020 at 8:39 PM Bharath Rupireddy <
bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:

> On Wed, Oct 7, 2020 at 8:00 AM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > On Tue, Oct 6, 2020 at 11:41 AM Bharath Rupireddy
> > <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Oct 6, 2020 at 11:20 AM Fujii Masao <
> masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> > > >
> > > > +1 Or it's also ok to make each patch separately.
> > > > Anyway, thanks!
> > > >
> > >
> > > Thanks. +1 to have separate patches. I will post two separate patches
> > > for autoprewarm and WalRcvShutdownHandler for further review. The
> > > other two patches can be for startup.c and syslogger.c.
>

[ CC'd Robert Haas since he's the main name in interrupt.c,
test_shm_mq/worker.c, ]

src/test/modules/test_shm_mq/worker.c appears to do the right thing the
wrong way - it has its own custom handler instead of using die() or
SignalHandlerForShutdownRequest().

In contrast src/test/modules/worker_spi/worker_spi.c looks plain wrong.
Especially since it's quoted as an example of how to do things right. It
won't respond to SIGTERM at all while it's executing a query from its
queue, no matter how long that query takes or whether it blocks. It can
inhibit even postmaster shutdown as a result.

I was going to lob off a quick patch to fix this by making both use
quickdie() for SIGQUIT and die() for SIGTERM, but after reading for a bit
I'm no longer sure what the right choice even is. I'd welcome some opinions.

The problem is that but interrupt.c and interrupt.h actually define and
recommend different and simpler handlers for these jobs - ones that don't
actually work properly for code that calls into Pg core and might rely on
CHECK_FOR_INTERRUPTS(), InterruptsPending and ProcDiePending to properly
respect SIGTERM.

And to add to the confusion the bgworker infra adds its own different
default SIGTERM handler bgworker_die() that's weirdly in-between the
interrupt.c and postgres.c signal handling.

So I'm no longer sure how the example code should even be fixed. I'm not
convinced everyone using die() and quickdie() is good given they currently
seem to be assumed to be mainly for user backends. Maybe wwe should move
them to interrupt.c along with CHECK_FOR_INTERRUPTS(), ProcessInterrupts,
etc and document them as for all database-connected or shmem-connected
backends to use.

So in the medium term, interrupt.c's SignalHandlerForShutdownRequest() and
SignalHandlerForCrashExit() should be combined with die() and quickdie(),
integrating properly with CHECK_FOR_INTERRUPTS(), ProcessInterrupts(), etc.
We can add a hook we call before we proc_exit() in response to
ProcDiePending so backends can choose to mask it if there's a period during
which they wish to defer responding to SIGTERM, but by default everything
will respect SIGTERM -> die() sets ProcDiePending -> CHECK_FOR_INTERRUPTS()
-> ProcessInterrupts() -> proc_exit() . interrupt.c's
SignalHandlerForCrashExit() and SignalHandlerForShutdownRequest() become
deprecated/legacy. We add a separate temporary handler that's installed by
init.c for early SIGQUIT handling but document it as to be replaced after
backends start properly. We'd delete the bgw-specific signal handlers and
install die() and procdie() instead during StartBackgroundWorker - at least
if the bgw is connecting to shmem or a database. interrupt.c's
HandleMainLoopInterrupts() could be static inlined, and adopted in the
bgworker examples and all the other places that currently do
ConfigReloadPending / ProcessConfigFile() etc themselves.

It wouldn't be a clean sweep of consistent signal handling, given all the
funky stuff we have in the checkpointer, walsender, etc. But I think it
might help...

(And maybe I could even combine the various am_foo and is_bar globals we
use to identify different sorts of backend, while doing such cleanups).

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ian Lawrence Barwick 2020-10-23 04:13:56 Re: "unix_socket_directories" should be GUC_LIST_INPUT?
Previous Message Tom Lane 2020-10-23 03:56:35 Re: "unix_socket_directories" should be GUC_LIST_INPUT?