Re: [HACKERS] Parallel Append implementation

From: amul sul <sulamul(at)gmail(dot)com>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Parallel Append implementation
Date: 2017-11-21 11:57:01
Message-ID: CAAJ_b97kLNW8Z9nvc_JUUG5wVQUXvG=f37WsX8ALF0A=KAHh3w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Tue, Nov 21, 2017 at 2:22 PM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> On 21 November 2017 at 12:44, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com> wrote:
>> On Mon, Nov 13, 2017 at 12:54 PM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>>> Thanks a lot Robert for the patch. I will have a look. Quickly tried
>>> to test some aggregate queries with a partitioned pgbench_accounts
>>> table, and it is crashing. Will get back with the fix, and any other
>>> review comments.
>>>
>>> Thanks
>>> -Amit Khandekar
>>
>> I was trying to get the performance of this patch at commit id -
>> 11e264517dff7a911d9e6494de86049cab42cde3 and TPC-H scale factor 20
>> with the following parameter settings,
>> work_mem = 1 GB
>> shared_buffers = 10GB
>> effective_cache_size = 10GB
>> max_parallel_workers_per_gather = 4
>> enable_partitionwise_join = on
>>
>> and the details of the partitioning scheme is as follows,
>> tables partitioned = lineitem on l_orderkey and orders on o_orderkey
>> number of partitions in each table = 10
>>
>> As per the explain outputs PA was used in following queries- 1, 3, 4,
>> 5, 6, 7, 8, 10, 12, 14, 15, 18, and 21.
>> Unfortunately, at the time of executing any of these query, it is
>> crashing with the following information in core dump of each of the
>> workers,
>>
>> Program terminated with signal 11, Segmentation fault.
>> #0 0x0000000010600984 in pg_atomic_read_u32_impl (ptr=0x3ffffec29294)
>> at ../../../../src/include/port/atomics/generic.h:48
>> 48 return ptr->value;
>>
>> In case this a different issue as you pointed upthread, you may want
>> to have a look at this as well.
>> Please let me know if you need any more information in this regard.
>
> Right, for me the crash had occurred with a similar stack, although
> the real crash happened in one of the workers. Attached is the script
> file
> pgbench_partitioned.sql to create a schema with which I had reproduced
> the crash.
>
> The query that crashed :
> select sum(aid), avg(aid) from pgbench_accounts;
>
> Set max_parallel_workers_per_gather to at least 5.
>
> Also attached is v19 patch rebased.
>

I've spent little time to debug this crash. The crash happens in ExecAppend()
due to subnode in node->appendplans array is referred using incorrect
array index (out of bound value) in the following code:

/*
* figure out which subplan we are currently processing
*/
subnode = node->appendplans[node->as_whichplan];

This incorrect value to node->as_whichplan is get assigned in the
choose_next_subplan_for_worker().

By doing following change on the v19 patch does the fix for me:

--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -489,11 +489,9 @@ choose_next_subplan_for_worker(AppendState *node)
}

/* Pick the plan we found, and advance pa_next_plan one more time. */
- node->as_whichplan = pstate->pa_next_plan;
+ node->as_whichplan = pstate->pa_next_plan++;
if (pstate->pa_next_plan == node->as_nplans)
pstate->pa_next_plan = append->first_partial_plan;
- else
- pstate->pa_next_plan++;

/* If non-partial, immediately mark as finished. */
if (node->as_whichplan < append->first_partial_plan)

Attaching patch does same changes to Amit's ParallelAppend_v19_rebased.patch.

Regards,
Amul

Attachment Content-Type Size
fix_crash.patch application/octet-stream 686 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David CARLIER 2017-11-21 12:08:46 [PATCH] using arc4random for strong randomness matters.
Previous Message Amit Khandekar 2017-11-21 11:54:35 Re: [HACKERS] UPDATE of partition key