Inconsistent behavior of smart shutdown handling for queries with and without parallel workers

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Inconsistent behavior of smart shutdown handling for queries with and without parallel workers
Date: 2020-08-11 13:20:27
Message-ID: CALj2ACWTAQ2uWgj4yRFLQ6t15MMYV_uc3GCT5F5p8R9pzrd7yQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In case of smart shutdown postmaster receives SIGTERM from the pg_ctl,
it "disallows new connections, but lets existing sessions end their
work normally". Which means that it doesn't abort any ongoing txns in
any of the sessions and it lets the sessions to exit(on their own) and
then the postmaster is shut down. Looks like this behavior is true
only if the sessions are executing non-parallel queries. Parallel
queries are getting aborted, see [1].

Although the postmaster receives two different signals for
smart(SIGTERM) and fast(SIGINT) shutdowns, it only sends SIGTERM to
bgworkers for both the cases. (see pmdie() -> SignalSomeChildren() in
postmaster.c). In
StartBackgroundWorker(), bgworkers have the bgworker_die() as default
handler for SIGTERM, which just reports FATAL error. Is this handler
correct for both fast and smart shutdown for all types of bgworkers?

For parallel workers in ParallelWorkerMain(), SIGTERM handler gets
changed to die()(which means for both smart and fast shutdowns, the
same handler gets used), which sets ProcDiePending = true; and later
if the parallel workers try to CHECK_FOR_INTERRUPTS(); (for parallel
seq scan, it is done in ExecScanFetch()), since ProcDiePending was set
to true, the parallel workers throw error "terminating connection due
to administrator command" in ProcessInterrupts().
Having die() as a handler for fast shutdown may be correct, but for
smart shutdown, as mentioned in $subject, it looks inconsistent.

1. In general, do we need to allow postmaster to send different
signals to bgworkers for fast and smart shutdowns and let them
differentiate the two modes(if needed)?
2. Do we need to handle smart shutdown for dynamic bgworkers(parallel
workers) with a different signal than SIGTERM and retain SIGTERM
handler die() as is, since SIGTERM is being sent to bgworkers from
other places as well? If we do so, we can block that new signal until
the parallel workers finish the current query execution or completely
ignore it in ParallelWorkerMain(). If the bgw_flags flag is
BGWORKER_CLASS_PARALLEL, we can do some changes in postmaster's
SignalSomeChildren() to detect and send that new signal. Looks like
SIGUSR2 remains currently ignored for dynamic bgworker, and can be
used for this purpose.

Thoughts?

Thanks @vignesh C for inputs and writeup review.

[1] (smart shutdown issued with pg_ctl, while the parallel query is running).
postgres=# EXPLAIN ANALYZE SELECT COUNT(*) FROM t_test t1, t_test t2
WHERE t1.many = t2.many;
ERROR: terminating connection due to administrator command
CONTEXT: parallel worker

[2] The usual behavior of: smart shutdown - lets existing sessions end
their work normally and fast shutdown - abort their current
transactions and exit promptly.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-08-11 14:03:29 Re: recovering from "found xmin ... from before relfrozenxid ..."
Previous Message Ashutosh Bapat 2020-08-11 13:06:03 Re: Can I test Extended Query in core test framework