Re: Parallel query vs smart shutdown and Postmaster death

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel query vs smart shutdown and Postmaster death
Date: 2019-02-26 22:43:55
Message-ID: CA+hUKG+MF0G7f8UKvTWiGs4iFng5bA_jL8RT4X2WdhP+oE8gkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 25, 2019 at 2:13 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> 1. In a nearby thread, I misdiagnosed a problem reported[1] by Justin
> Pryzby (though my misdiagnosis is probably still a thing to be fixed;
> see next). I think I just spotted the real problem he saw: if you
> execute a parallel query after a smart shutdown has been initiated,
> you wait forever in gather_readnext()! Maybe parallel workers can't
> be launched in this state, but we lack code to detect this case? I
> haven't dug into the exact mechanism or figured out what to do about
> it yet, and I'm tied up with something else for a bit, but I will come
> back to this later if nobody beats me to it.

Given smart shutdown's stated goal, namely that it "lets existing
sessions end their work normally", my questions are:

1. Why does pmdie()'s SIGTERM case terminate parallel workers
immediately? That breaks aborts running parallel queries, so they
don't get to end their work normally.
2. Why are new parallel workers not allowed to be started while in
this state? That hangs future parallel queries forever, so they don't
get to end their work normally.
3. Suppose we fix the above cases; should we do it for parallel
workers only (somehow), or for all bgworkers? It's hard to say since
I don't know what all bgworkers do.

In the meantime, perhaps we should teach the postmaster to report this
case as a failure to launch in back-branches, so that at least
parallel queries don't hang forever? Here's an initial sketch of a
patch like that: it gives you "ERROR: parallel worker failed to
initialize" and "HINT: More details may be available in the server
log." if you try to run a parallel query. The HINT is right, the
server logs say that a smart shutdown is in progress. If that seems a
bit hostile, consider that any parallel queries that were running at
the moment the smart shutdown was requested have already been ordered
to quit; why should new queries started after that get a better deal?
Then perhaps we could do some more involved surgery on master that
achieves smart shutdown's stated goal here, and lets parallel queries
actually run? Better ideas welcome.

--
Thomas Munro
https://enterprisedb.com

Attachment Content-Type Size
0001-Report-bgworker-launch-failure-during-smart-shutdown.patch application/octet-stream 2.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-02-26 22:49:53 Re: Segfault when restoring -Fd dump on current HEAD
Previous Message Tom Lane 2019-02-26 22:31:12 Re: Allowing extensions to supply operator-/function-specific info