Re: Parallel Append implementation

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Append implementation
Date: 2017-09-11 11:19:06
Message-ID: CAJ3gD9cdSC2sZ84+QhZQ1CbufnDYvDdQ-=Q8P1Gy6zCdVkBbfA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8 September 2017 at 19:17, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Fri, Sep 8, 2017 at 3:59 PM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>> On 7 September 2017 at 11:05, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>> On Thu, Aug 31, 2017 at 12:47 PM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>>> 3.
>>> +/* ----------------------------------------------------------------
>>> + * exec_append_leader_next
>>> + *
>>> + * To be used only if it's a parallel leader. The backend should scan
>>> + * backwards from the last plan. This is to prevent it from taking up
>>> + * the most expensive non-partial plan, i.e. the first subplan.
>>> + * ----------------------------------------------------------------
>>> + */
>>> +static bool
>>> +exec_append_leader_next(AppendState *state)
>>>
>>> From above explanation, it is clear that you don't want backend to
>>> pick an expensive plan for a leader, but the reason for this different
>>> treatment is not clear.
>>
>> Explained it, saying that for more workers, a leader spends more work
>> in processing the worker tuples , and less work contributing to
>> parallel processing. So it should not take expensive plans, otherwise
>> it will affect the total time to finish Append plan.
>>
>
> In that case, why can't we keep the workers also process in same
> order, what is the harm in that?

Because of the way the logic of queuing works, the workers finish
earlier if they start with expensive plans first. For e.g. : with 3
plans with costs 8, 4, 4 and with 2 workers w1 and w2, they will
finish in 8 time units (w1 will finish plan 1 in 8, while in parallel
w2 will finish the remaining 2 plans in 8 units. Whereas if the plans
are ordered like : 4, 4, 8, then the workers will finish in 12 time
units (w1 and w2 will finish each of the 1st two plans in 4 units, and
then w1 or w2 will take up plan 3 and finish in 8 units, while the
other worker remains idle).

> Also, the leader will always scan
> the subplans from last subplan even if all the subplans are partial
> plans.

Since we already need to have two different code paths, I think we can
use the same code paths for any subplans.

> I think this will be the unnecessary difference in the
> strategy of leader and worker especially when all paths are partial.
> I think the selection of next subplan might become simpler if we use
> the same strategy for worker and leader.

Yeah if we had a common method for both it would have been better. But
anyways we have different logics to maintain.

>
> Few more comments:
>
> 1.
> + else if (IsA(subpath, MergeAppendPath))
> + {
> + MergeAppendPath *mpath = (MergeAppendPath *) subpath;
> +
> + /*
> + * If at all MergeAppend is partial, all its child plans have to be
> + * partial : we don't currently support a mix of partial and
> + * non-partial MergeAppend subpaths.
> + */
> + if (is_partial)
> + return list_concat(partial_subpaths, list_copy(mpath->subpaths));
>
> In which situation partial MergeAppendPath is generated? Can you
> provide one example of such path?

Actually currently we don't support partial paths for MergeAppendPath.
That code just has that if condition (is_partial) but currently that
condition won't be true for MergeAppendPath.

>
> 2.
> add_paths_to_append_rel()
> {
> ..
> + /* Consider parallel append path. */
> + if (pa_subpaths_valid)
> + {
> + AppendPath *appendpath;
> + int parallel_workers;
> +
> + parallel_workers = get_append_num_workers(pa_partial_subpaths,
> + pa_nonpartial_subpaths);
> + appendpath = create_append_path(rel, pa_nonpartial_subpaths,
> + pa_partial_subpaths,
> + NULL, parallel_workers, true,
> + partitioned_rels);
> + add_partial_path(rel, (Path *) appendpath);
> + }
> +
> /*
> - * Consider an append of partial unordered, unparameterized partial paths.
> + * Consider non-parallel partial append path. But if the parallel append
> + * path is made out of all partial subpaths, don't create another partial
> + * path; we will keep only the parallel append path in that case.
> */
> - if (partial_subpaths_valid)
> + if (partial_subpaths_valid && !pa_all_partial_subpaths)
> {
> AppendPath *appendpath;
> ListCell *lc;
> int parallel_workers = 0;
>
> /*
> - * Decide on the number of workers to request for this append path.
> - * For now, we just use the maximum value from among the members. It
> - * might be useful to use a higher number if the Append node were
> - * smart enough to spread out the workers, but it currently isn't.
> + * To decide the number of workers, just use the maximum value from
> + * among the children.
> */
> foreach(lc, partial_subpaths)
> {
> @@ -1421,9 +1502,9 @@ add_paths_to_append_rel(PlannerInfo *root,
> RelOptInfo *rel,
> }
> Assert(parallel_workers > 0);
>
> - /* Generate a partial append path. */
> - appendpath = create_append_path(rel, partial_subpaths, NULL,
> - parallel_workers, partitioned_rels);
> + appendpath = create_append_path(rel, NIL, partial_subpaths,
> + NULL, parallel_workers, false,
> + partitioned_rels);
> add_partial_path(rel, (Path *) appendpath);
> }
> ..
> }
>
> I think it might be better to add a sentence why we choose a different
> way to decide a number of workers in the second case
> (non-parallel-aware append).

Yes, I agree. Will do that after we conclude with your next point below ...

> Do you think non-parallel-aware Append
> will be better in any case when there is a parallel-aware append? I
> mean to say let's try to create non-parallel-aware append only when
> parallel-aware append is not possible.

By non-parallel-aware append, I am assuming you meant partial
non-parallel-aware Append. Yes, if the parallel-aware Append path has
*all* partial subpaths chosen, then we do omit a partial non-parallel
Append path, as seen in this code in the patch :

/*
* Consider non-parallel partial append path. But if the parallel append
* path is made out of all partial subpaths, don't create another partial
* path; we will keep only the parallel append path in that case.
*/
if (partial_subpaths_valid && !pa_all_partial_subpaths)
{
......
}

But if the parallel-Append path has a mix of partial and non-partial
subpaths, then we can't really tell which of the two could be cheapest
until we calculate the cost. It can be that the non-parallel-aware
partial Append can be cheaper as well.

>
> 3.
> + * evaluates to a value just a bit greater than max(w1,w2, w3). So, we
>
> The spacing between w1, w2, w3 is not same.

Right, will note this down for the next updated patch.

>
> 4.
> - select count(*) from a_star;
> -select count(*) from a_star;
> + select round(avg(aa)), sum(aa) from a_star;
> +select round(avg(aa)), sum(aa) from a_star;
>
> Why you have changed the existing test. It seems count(*) will also
> give what you are expecting.

Needed to do cover some data testing with Parallel Append execution.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2017-09-11 11:21:04 Re: Automatic testing of patches in commit fest
Previous Message Alexander Kuzmenkov 2017-09-11 10:59:42 Re: [POC] Faster processing at Gather node