Re: Parallel query hangs after a smart shutdown is issued

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>
Subject: Re: Parallel query hangs after a smart shutdown is issued
Date: 2020-08-12 16:56:03
Message-ID: CA+hUKGL0_PqgAc9xEa-gZqtgYY0ykeJW=oWsT3g9z9LURozqTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 13, 2020 at 3:32 AM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> After a smart shutdown is issued(with pg_ctl), run a parallel query,
> then the query hangs. The postmaster doesn't inform backends about the
> smart shutdown(see pmdie() -> SIGTERM -> BACKEND_TYPE_NORMAL are not
> informed), so if they request parallel workers, the postmaster is
> unable to fork any workers as it's status(pmState) gets changed to
> PM_WAIT_BACKENDS(see maybe_start_bgworkers() -->
> bgworker_should_start_now() returns false).
>
> Few ways we could solve this:
> 1. Do we want to disallow parallelism when there is a pending smart
> shutdown? - If yes, then, we can let the postmaster know the regular
> backends whenever a smart shutdown is received and the backends use
> this info to not consider parallelism. If we use SIGTERM to notify,
> since the backends have die() as handlers, they just cancel the
> queries which is again an inconsistent behaviour[1]. Would any other
> signal like SIGUSR2(I think it's currently ignored by backends) be
> used here? If the signals are overloaded, can we multiplex SIGTERM
> similar to SIGUSR1? If we don't want to use signals at all, the
> postmaster can make an entry of it's status in bg worker shared memory
> i.e. BackgroundWorkerData, RegisterDynamicBackgroundWorker() can
> simply return, without requesting the postmaster for parallel workers.
>
> 2. If we want to allow parallelism, then, we can tweak
> bgworker_should_start_now(), detect that the pending bg worker fork
> requests are for parallelism, and let the postmaster start the
> workers.
>
> Thoughts?

Hello Bharath,

Yeah, the current situation is not good. I think your option 2 sounds
better, because the documented behaviour of smart shutdown is that it
"lets existing sessions end their work normally". I think that means
that a query that is already running or allowed to start should be
able to start new workers and not have its existing workers
terminated. Arseny Sher wrote a couple of different patches to try
that last year, but they fell through the cracks:

https://www.postgresql.org/message-id/flat/CA%2BhUKGLrJij0BuFtHsMHT4QnLP54Z3S6vGVBCWR8A49%2BNzctCw%40mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-08-12 17:38:55 Re: Improving connection scalability: GetSnapshotData()
Previous Message Andres Freund 2020-08-12 16:27:18 Re: posgres 12 bug (partitioned table)