Re: crashes due to setting max_parallel_workers=0

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: crashes due to setting max_parallel_workers=0
Date: 2017-03-27 16:11:14
Message-ID: CAKJS1f_DJmH44o0ODDoJb=rAUeo5b39u9PTa_xFD6AuqyptBOA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 28 March 2017 at 04:57, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
> <rushabh(dot)lathia(at)gmail(dot)com> wrote:
>> About the original issue reported by Tomas, I did more debugging and
>> found that - problem was gather_merge_clear_slots() was not returning
>> the clear slot when nreader is zero (means nworkers_launched = 0).
>> Due to the same scan was continue even all the tuple are exhausted,
>> and then end up with server crash at gather_merge_getnext(). In the patch
>> I also added the Assert into gather_merge_getnext(), about the index
>> should be less then the nreaders + 1 (leader).
>
> Well, you and David Rowley seem to disagree on what the fix is here.
> His patches posted upthread do A, and yours do B, and from a quick
> look those things are not just different ways of spelling the same
> underlying fix, but actually directly conflicting ideas about what the
> fix should be. Any chance you can review his patches, and maybe he
> can review yours, and we could try to agree on a consensus position?
> :-)

When comparing Rushabh's to my minimal patch, both change
gather_merge_clear_slots() to clear the leader's slot. My fix
mistakenly changes things so it does ExecInitExtraTupleSlot() on the
leader's slot, but seems that's not required since
gather_merge_readnext() sets the leader's slot to the output of
ExecProcNode(outerPlan). I'd ignore my minimal fix because of that
mistake. Rushabh's patch sidesteps this by adding a conditional
pfree() to not free slot that we didn't allocate in the first place.

I do think the code could be improved a bit. I don't like the way
GatherMergeState's nreaders and nworkers_launched are always the same.
I think this all threw me off a bit and may have been the reason for
the bug in the first place.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2017-03-27 16:16:11 Re: Guidelines for GSoC student proposals / Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions
Previous Message Mike Palmiotto 2017-03-27 16:09:08 Re: partitioned tables and contrib/sepgsql