Re: crashes due to setting max_parallel_workers=0

From: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: crashes due to setting max_parallel_workers=0
Date: 2017-03-28 09:07:39
Message-ID: CAGPqQf0CcH+cyU=WK+bUvwgFEVFdGPFaBC2RMRz_hPCEKEFhwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 28, 2017 at 10:00 AM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com
> wrote:

>
>
> On 03/27/2017 01:40 PM, Rushabh Lathia wrote:
>
>>
>> ...
>> I was doing more testing with the patch and I found one more server
>> crash with the patch around same area, when we forced the gather
>> merge for the scan having zero rows.
>>
>> create table dept ( deptno numeric, dname varchar(20);
>> set parallel_tuple_cost =0;
>> set parallel_setup_cost =0;
>> set min_parallel_table_scan_size =0;
>> set min_parallel_index_scan_size =0;
>> set force_parallel_mode=regress;
>> explain analyze select * from dept order by deptno;
>>
>> This is because for leader we don't initialize the slot into gm_slots. So
>> in case where launched worker is zero and table having zero rows, we
>> end up having NULL slot into gm_slots array.
>>
>> Currently gather_merge_clear_slots() clear out the tuple table slots for
>> each
>> gather merge input and returns clear slot. In the patch I modified
>> function
>> gather_merge_clear_slots() to just clear out the tuple table slots and
>> always return NULL when All the queues and heap us exhausted.
>>
>>
> Isn't that just another sign the code might be a bit too confusing? I see
> two main issues in the code:
>
> 1) allocating 'slots' as 'nreaders+1' elements, which seems like a good
> way to cause off-by-one errors
>
> 2) mixing objects with different life spans (slots for readers vs. slot
> for the leader) seems like a bad idea too
>
> I wonder how much we gain by reusing the slot from the leader (I'd be
> surprised if it was at all measurable). David posted a patch reworking
> this, and significantly simplifying the GatherMerge node. Why not to accept
> that?
>
>
>
I think we all agree that we should get rid of nreaders from the
GatherMergeState
and need to do some code re-factor. But if I understood correctly that
Robert's
concern was to do that re-factor as separate commit.

I picked David's patch and started reviewing the changes. I applied that
patch
on top of my v2 patch (which does the re-factor of
gather_merge_clear_slots).

In David's patch, into gather_merge_init(), a loop where tuple array is
getting
allocate, that loop need to only up to nworkers_launched. Because we don't
hold the tuple array for leader. I changed that and did some other simple
changes based on mine v2 patch. I also performed manual testing with
the changes.

Please find attached re-factor patch, which is v2 patch submitted for the
server crash fix. (Attaching both the patch here again, for easy of access).

Thanks,

--
Rushabh Lathia
www.EnterpriseDB.com

Attachment Content-Type Size
gather_merge_refactor.patch text/x-patch 6.7 KB
gm_nworkers_launched_zero_v2.patch text/x-patch 1.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2017-03-28 09:14:01 xmltable doc fix and example for XMLNAMESPACES
Previous Message Mithun Cy 2017-03-28 09:00:55 Re: [POC] A better way to expand hash indexes.