Re: Parallel query vs smart shutdown and Postmaster death

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel query vs smart shutdown and Postmaster death
Date: 2019-03-18 03:55:02
Message-ID: CA+hUKG+7m3hvhf8LBRftnpX6FmYa282XtgFhjw=aLd1RBxi-Bg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 17, 2019 at 5:53 PM Arseny Sher <a(dot)sher(at)postgrespro(dot)ru> wrote:
> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
>
> > 1. Why does pmdie()'s SIGTERM case terminate parallel workers
> > immediately? That breaks aborts running parallel queries, so they
> > don't get to end their work normally.
> > 2. Why are new parallel workers not allowed to be started while in
> > this state? That hangs future parallel queries forever, so they don't
> > get to end their work normally.
> > 3. Suppose we fix the above cases; should we do it for parallel
> > workers only (somehow), or for all bgworkers? It's hard to say since
> > I don't know what all bgworkers do.
>
> Attached patch fixes 1 and 2. As for the 3, the only other internal
> bgworkers currently are logical replication launcher and workers, which
> arguably should be killed immediately.

Hi Arseny,

Thanks for working on this! Yes, it seems better to move forwards
rather than backwards, and fix this properly as you have done in this
patch.

Just a thought: instead of the new hand-coded loop you added in
pmdie(), do you think it would make sense to have a new argument
"exclude_class_mask" for SignalSomeChildren()? If we did that, I
would consider renaming the existing parameter "target" to
"include_type_mask" to make it super clear what's going on.

> > Here's an initial sketch of a
> > patch like that: it gives you "ERROR: parallel worker failed to
> > initialize" and "HINT: More details may be available in the server
> > log." if you try to run a parallel query.
>
> Reporting bgworkers that postmaster is never going to start is probably
> worthwhile doing anyway to prevent potential further deadlocks like
> this. However, I think there is a problem in your patch: we might be in
> post PM_RUN states due to FatalError, not because of shutdown. In this
> case, we shouldn't refuse to run bgws in the future. I would also merge
> the check into bgworker_should_start_now.

Hmm, yeah, I haven't tested or double-checked in detail yet but I
think you're probably right about all of that.

--
Thomas Munro
https://enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-03-18 04:02:35 Re: Offline enabling/disabling of data checksums
Previous Message Iwata, Aya 2019-03-18 03:03:04 RE: [PATCH] get rid of StdRdOptions, use individual binary reloptions representation for each relation kind instead