Re: Parallel query hangs after a smart shutdown is issued

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>
Subject: Re: Parallel query hangs after a smart shutdown is issued
Date: 2020-08-12 18:00:55
Message-ID: 272331.1597255255@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> On Thu, Aug 13, 2020 at 3:32 AM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>> After a smart shutdown is issued(with pg_ctl), run a parallel query,
>> then the query hangs.

> Yeah, the current situation is not good. I think your option 2 sounds
> better, because the documented behaviour of smart shutdown is that it
> "lets existing sessions end their work normally". I think that means
> that a query that is already running or allowed to start should be
> able to start new workers and not have its existing workers
> terminated. Arseny Sher wrote a couple of different patches to try
> that last year, but they fell through the cracks:
> https://www.postgresql.org/message-id/flat/CA%2BhUKGLrJij0BuFtHsMHT4QnLP54Z3S6vGVBCWR8A49%2BNzctCw%40mail.gmail.com

I already commented on this in the other thread that Bharath started [1].
I think the real issue here is why is the postmaster's SIGTERM handler
doing *anything* other than disallowing new connections? It seems quite
premature to kill support processes of any sort, not only parallel
workers. The documentation says existing clients are allowed to end
their work, not that their performance is going to be crippled until
they end.

So I looked at moving the kills of all the support processes to happen
after we detect that there are no remaining regular backends, and it
seems to not be too hard. I realized that the existing PM_WAIT_READONLY
state is doing that already, but just for a subset of support processes
that it thinks might be active in hot standby mode. So what I did in the
attached was to repurpose that state as "PM_WAIT_CLIENTS", which does the
right thing in either regular or hot standby mode.

One other thing I changed here was to remove PM_WAIT_READONLY from the
set of states in which we'll allow promotion to occur or a new walreceiver
to start. I'm not convinced that either of those behaviors aren't
bugs; although if someone thinks they're right, we can certainly put
back PM_WAIT_CLIENTS in those checks. (But, for example, it does not
appear possible to reach PM_WAIT_READONLY/PM_WAIT_CLIENTS state with
Shutdown == NoShutdown, so the test in MaybeStartWalReceiver sure looks
like confusingly dead code to me. If we do want to allow restarting
the walreceiver in this state, the Shutdown condition needs fixed.)

regards, tom lane

[1] https://www.postgresql.org/message-id/65189.1597181322%40sss.pgh.pa.us

Attachment Content-Type Size
keep-bg-processes-alive-in-smart-shutdown-1.patch text/x-diff 7.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-08-12 18:02:18 Re: [BUG] Error in BRIN summarization
Previous Message Jaime Casanova 2020-08-12 17:54:45 Re: EDB builds Postgres 13 with an obsolete ICU version