Re: crashes due to setting max_parallel_workers=0

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: crashes due to setting max_parallel_workers=0
Date: 2017-03-26 22:13:09
Message-ID: e96d3112-26dd-d77d-7f8f-19b512ec4a02@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/25/2017 05:18 PM, Rushabh Lathia wrote:
>
>
> On Sat, Mar 25, 2017 at 7:01 PM, Peter Eisentraut
> <peter(dot)eisentraut(at)2ndquadrant(dot)com
> <mailto:peter(dot)eisentraut(at)2ndquadrant(dot)com>> wrote:
>
> On 3/25/17 09:01, David Rowley wrote:
> > On 25 March 2017 at 23:09, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com <mailto:rushabh(dot)lathia(at)gmail(dot)com>> wrote:
> >> Also another point which I think we should fix is, when someone set
> >> max_parallel_workers = 0, we should also set the
> >> max_parallel_workers_per_gather
> >> to zero. So that way it we can avoid generating the gather path with
> >> max_parallel_worker = 0.
> > I see that it was actually quite useful that it works the way it does.
> > If it had worked the same as max_parallel_workers_per_gather, then
> > likely Tomas would never have found this bug.
>
> Another problem is that the GUC system doesn't really support cases
> where the validity of one setting depends on the current value of
> another setting. So each individual setting needs to be robust against
> cases of related settings being nonsensical.
>
>
> Okay.
>
> About the original issue reported by Tomas, I did more debugging and
> found that - problem was gather_merge_clear_slots() was not returning
> the clear slot when nreader is zero (means nworkers_launched = 0).
> Due to the same scan was continue even all the tuple are exhausted,
> and then end up with server crash at gather_merge_getnext(). In the patch
> I also added the Assert into gather_merge_getnext(), about the index
> should be less then the nreaders + 1 (leader).
>
> PFA simple patch to fix the problem.
>

I think there are two issues at play, here - the first one is that we
still produce parallel plans even with max_parallel_workers=0, and the
second one is the crash in GatherMerge when nworkers=0.

Your patch fixes the latter (thanks for looking into it), which is
obviously a good thing - getting 0 workers on a busy system is quite
possible, because all the parallel workers can be already chewing on
some other query.

But it seems a bit futile to produce the parallel plan in the first
place, because with max_parallel_workers=0 we can't possibly get any
parallel workers ever. I wonder why compute_parallel_worker() only looks
at max_parallel_workers_per_gather, i.e. why shouldn't it do:

parallel_workers = Min(parallel_workers, max_parallel_workers);

Perhaps this was discussed and is actually intentional, though.

Regarding handling this at the GUC level - I agree with Peter that
that's not a good idea. I suppose we could deal with checking the values
in the GUC check/assign hooks, but what we don't have is a way to undo
the changes in all the GUCs. That is, if I do

SET max_parallel_workers = 0;
SET max_parallel_workers = 16;

I expect to end up with just max_parallel_workers GUC changed and
nothing else.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2017-03-26 22:28:39 Re: Performance improvement for joins where outer side is unique
Previous Message Thomas Munro 2017-03-26 22:03:24 Re: WIP: [[Parallel] Shared] Hash