Re: Parallel Append implementation

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Append implementation
Date: 2017-09-11 13:25:19
Message-ID: CAA4eK1++-H=04TjeFKAkbDZ4TL0W092ryAqf8mrjU+pTQUS8+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 11, 2017 at 4:49 PM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> On 8 September 2017 at 19:17, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>
>> In that case, why can't we keep the workers also process in same
>> order, what is the harm in that?
>
> Because of the way the logic of queuing works, the workers finish
> earlier if they start with expensive plans first. For e.g. : with 3
> plans with costs 8, 4, 4 and with 2 workers w1 and w2, they will
> finish in 8 time units (w1 will finish plan 1 in 8, while in parallel
> w2 will finish the remaining 2 plans in 8 units. Whereas if the plans
> are ordered like : 4, 4, 8, then the workers will finish in 12 time
> units (w1 and w2 will finish each of the 1st two plans in 4 units, and
> then w1 or w2 will take up plan 3 and finish in 8 units, while the
> other worker remains idle).
>

I think the patch stores only non-partial paths in decreasing order,
what if partial paths having more costs follows those paths?

>
>>
>> Few more comments:
>>
>> 1.
>> + else if (IsA(subpath, MergeAppendPath))
>> + {
>> + MergeAppendPath *mpath = (MergeAppendPath *) subpath;
>> +
>> + /*
>> + * If at all MergeAppend is partial, all its child plans have to be
>> + * partial : we don't currently support a mix of partial and
>> + * non-partial MergeAppend subpaths.
>> + */
>> + if (is_partial)
>> + return list_concat(partial_subpaths, list_copy(mpath->subpaths));
>>
>> In which situation partial MergeAppendPath is generated? Can you
>> provide one example of such path?
>
> Actually currently we don't support partial paths for MergeAppendPath.
> That code just has that if condition (is_partial) but currently that
> condition won't be true for MergeAppendPath.
>

I think then it is better to have Assert for MergeAppend. It is
generally not a good idea to add code which we can never hit.

>>
>> 2.
>> add_paths_to_append_rel()
..
>>
>> I think it might be better to add a sentence why we choose a different
>> way to decide a number of workers in the second case
>> (non-parallel-aware append).
>
> Yes, I agree. Will do that after we conclude with your next point below ...
>
>> Do you think non-parallel-aware Append
>> will be better in any case when there is a parallel-aware append? I
>> mean to say let's try to create non-parallel-aware append only when
>> parallel-aware append is not possible.
>
> By non-parallel-aware append, I am assuming you meant partial
> non-parallel-aware Append. Yes, if the parallel-aware Append path has
> *all* partial subpaths chosen, then we do omit a partial non-parallel
> Append path, as seen in this code in the patch :
>
> /*
> * Consider non-parallel partial append path. But if the parallel append
> * path is made out of all partial subpaths, don't create another partial
> * path; we will keep only the parallel append path in that case.
> */
> if (partial_subpaths_valid && !pa_all_partial_subpaths)
> {
> ......
> }
>
> But if the parallel-Append path has a mix of partial and non-partial
> subpaths, then we can't really tell which of the two could be cheapest
> until we calculate the cost. It can be that the non-parallel-aware
> partial Append can be cheaper as well.
>

How? See, if you have four partial subpaths and two non-partial
subpaths, then for parallel-aware append it considers all six paths in
parallel path whereas for non-parallel-aware append it will consider
just four paths and that too with sub-optimal strategy. Can you
please try to give me some example so that it will be clear.

>>
>> 4.
>> - select count(*) from a_star;
>> -select count(*) from a_star;
>> + select round(avg(aa)), sum(aa) from a_star;
>> +select round(avg(aa)), sum(aa) from a_star;
>>
>> Why you have changed the existing test. It seems count(*) will also
>> give what you are expecting.
>
> Needed to do cover some data testing with Parallel Append execution.
>

Okay.

One more thing, I think you might want to update comment atop
add_paths_to_append_rel to reflect the new type of parallel-aware
append path being generated. In particular, I am referring to below
part of the comment:

* Similarly it collects partial paths from
* non-dummy children to create partial append paths.
*/
static void
add_paths_to_append_rel()

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-09-11 14:12:57 Re: [POC] hash partitioning
Previous Message Aleksander Alekseev 2017-09-11 13:01:25 Re: Automatic testing of patches in commit fest