Re: crashes due to setting max_parallel_workers=0

From: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: crashes due to setting max_parallel_workers=0
Date: 2017-03-27 16:36:57
Message-ID: CAGPqQf2=s2i578Lw5fao+EaN_FJXTXQ5MGdvJq8ECZofGOxveA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 27, 2017 at 9:52 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Mon, Mar 27, 2017 at 12:11 PM, David Rowley
> <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> > On 28 March 2017 at 04:57, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >> On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
> >> <rushabh(dot)lathia(at)gmail(dot)com> wrote:
> >>> About the original issue reported by Tomas, I did more debugging and
> >>> found that - problem was gather_merge_clear_slots() was not returning
> >>> the clear slot when nreader is zero (means nworkers_launched = 0).
> >>> Due to the same scan was continue even all the tuple are exhausted,
> >>> and then end up with server crash at gather_merge_getnext(). In the
> patch
> >>> I also added the Assert into gather_merge_getnext(), about the index
> >>> should be less then the nreaders + 1 (leader).
> >>
> >> Well, you and David Rowley seem to disagree on what the fix is here.
> >> His patches posted upthread do A, and yours do B, and from a quick
> >> look those things are not just different ways of spelling the same
> >> underlying fix, but actually directly conflicting ideas about what the
> >> fix should be. Any chance you can review his patches, and maybe he
> >> can review yours, and we could try to agree on a consensus position?
> >> :-)
> >
> > When comparing Rushabh's to my minimal patch, both change
> > gather_merge_clear_slots() to clear the leader's slot. My fix
> > mistakenly changes things so it does ExecInitExtraTupleSlot() on the
> > leader's slot, but seems that's not required since
> > gather_merge_readnext() sets the leader's slot to the output of
> > ExecProcNode(outerPlan). I'd ignore my minimal fix because of that
> > mistake. Rushabh's patch sidesteps this by adding a conditional
> > pfree() to not free slot that we didn't allocate in the first place.
> >
> > I do think the code could be improved a bit. I don't like the way
> > GatherMergeState's nreaders and nworkers_launched are always the same.
> > I think this all threw me off a bit and may have been the reason for
> > the bug in the first place.
>
> Yeah, if those fields are always the same, then I think that they
> should be merged. That seems undeniably a good idea.

Hmm I agree that it's good idea, and I will work on that as separate patch.

> Possibly we
> should fix the crash bug first, though, and then do that afterwards.
> What bugs me a little about Rushabh's fix is that it looks like magic.
> You have to know that we're looping over two things and freeing them
> up, but there's one more of one thing than the other thing. I think
> that at least needs some comments or something.
>
>
So in my second version of patch I change gather_merge_clear_slots() to
just clear the slot for the worker and some other clean up. Also throwing
NULL from gather_merge_getnext() when all the queues and heap are
exhausted - which earlier gather_merge_clear_slots() was returning clear
slot. This way we make sure that we don't run over freeing the slot for
the leader and gather_merge_getnext() don't need to depend on that
clear slot.

Thanks,
Rushabh Lathia
www.EnterpriseDB.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-03-27 16:38:38 Re: WIP: Faster Expression Processing v4
Previous Message Robert Haas 2017-03-27 16:35:50 Re: crashes due to setting max_parallel_workers=0