From: | Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: crashes due to setting max_parallel_workers=0 |
Date: | 2017-03-27 16:36:57 |
Message-ID: | CAGPqQf2=s2i578Lw5fao+EaN_FJXTXQ5MGdvJq8ECZofGOxveA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Mar 27, 2017 at 9:52 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Mar 27, 2017 at 12:11 PM, David Rowley
> <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> > On 28 March 2017 at 04:57, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >> On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
> >> <rushabh(dot)lathia(at)gmail(dot)com> wrote:
> >>> About the original issue reported by Tomas, I did more debugging and
> >>> found that - problem was gather_merge_clear_slots() was not returning
> >>> the clear slot when nreader is zero (means nworkers_launched = 0).
> >>> Due to the same scan was continue even all the tuple are exhausted,
> >>> and then end up with server crash at gather_merge_getnext(). In the
> patch
> >>> I also added the Assert into gather_merge_getnext(), about the index
> >>> should be less then the nreaders + 1 (leader).
> >>
> >> Well, you and David Rowley seem to disagree on what the fix is here.
> >> His patches posted upthread do A, and yours do B, and from a quick
> >> look those things are not just different ways of spelling the same
> >> underlying fix, but actually directly conflicting ideas about what the
> >> fix should be. Any chance you can review his patches, and maybe he
> >> can review yours, and we could try to agree on a consensus position?
> >> :-)
> >
> > When comparing Rushabh's to my minimal patch, both change
> > gather_merge_clear_slots() to clear the leader's slot. My fix
> > mistakenly changes things so it does ExecInitExtraTupleSlot() on the
> > leader's slot, but seems that's not required since
> > gather_merge_readnext() sets the leader's slot to the output of
> > ExecProcNode(outerPlan). I'd ignore my minimal fix because of that
> > mistake. Rushabh's patch sidesteps this by adding a conditional
> > pfree() to not free slot that we didn't allocate in the first place.
> >
> > I do think the code could be improved a bit. I don't like the way
> > GatherMergeState's nreaders and nworkers_launched are always the same.
> > I think this all threw me off a bit and may have been the reason for
> > the bug in the first place.
>
> Yeah, if those fields are always the same, then I think that they
> should be merged. That seems undeniably a good idea.
Hmm I agree that it's good idea, and I will work on that as separate patch.
> Possibly we
> should fix the crash bug first, though, and then do that afterwards.
> What bugs me a little about Rushabh's fix is that it looks like magic.
> You have to know that we're looping over two things and freeing them
> up, but there's one more of one thing than the other thing. I think
> that at least needs some comments or something.
>
>
So in my second version of patch I change gather_merge_clear_slots() to
just clear the slot for the worker and some other clean up. Also throwing
NULL from gather_merge_getnext() when all the queues and heap are
exhausted - which earlier gather_merge_clear_slots() was returning clear
slot. This way we make sure that we don't run over freeing the slot for
the leader and gather_merge_getnext() don't need to depend on that
clear slot.
Thanks,
Rushabh Lathia
www.EnterpriseDB.com
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2017-03-27 16:38:38 | Re: WIP: Faster Expression Processing v4 |
Previous Message | Robert Haas | 2017-03-27 16:35:50 | Re: crashes due to setting max_parallel_workers=0 |