Re: Parallel Append implementation

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Append implementation
Date: 2017-10-11 12:51:19
Message-ID: CAJ3gD9caK33D5ttj-GBKnD5t2jAEUPpOWMPkcnkS6qw-LejuEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 9 October 2017 at 16:03, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Fri, Oct 6, 2017 at 12:03 PM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>> On 6 October 2017 at 08:49, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>>
>>> Okay, but why not cheapest partial path?
>>
>> I gave some thought on this point. Overall I feel it does not matter
>> which partial path it should pick up. RIght now the partial paths are
>> not ordered. But for non-partial paths sake, we are just choosing the
>> very last path. So in case of mixed paths, leader will get a partial
>> path, but that partial path would not be the cheapest path. But if we
>> also order the partial paths, the same logic would then pick up
>> cheapest partial path. The question is, should we also order the
>> partial paths for the leader ?
>>
>> The only scenario I see where leader choosing cheapest partial path
>> *might* show some benefit, is if there are some partial paths that
>> need to do some startup work using only one worker. I think currently,
>> parallel hash join is one case where it builds the hash table, but I
>> guess here also, we support parallel hash build, but not sure about
>> the status.
>>
>
> You also need to consider how merge join is currently work in parallel
> (each worker need to perform the whole of work of right side).

Yes, here if the leader happens to take the right side, it may slow
down the overall merge join. But this seems to be a different case
than the case of high startup costs.

> I think there could be more scenario's where the startup cost is high
> and parallel worker needs to do that work independently.

True.

>
> For such plan, if leader starts it, it would be slow, and
>> no other worker would be able to help it, so its actual startup cost
>> would be drastically increased. (Another path is parallel bitmap heap
>> scan where the leader has to do something and the other workers wait.
>> But here, I think it's not much work for the leader to do). So
>> overall, to handle such cases, it's better for leader to choose a
>> cheapest path, or may be, a path with cheapest startup cost. We can
>> also consider sorting partial paths with decreasing startup cost.
>>
>
> Yeah, that sounds reasonable.

Attached patch sorts partial paths by descending startup cost.

On 6 October 2017 at 08:49, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Thu, Oct 5, 2017 at 4:11 PM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>>
>> Ok. How about removing pa_all_partial_subpaths altogether , and
>> instead of the below condition :
>>
>> /*
>> * If all the child rels have partial paths, and if the above Parallel
>> * Append path has a mix of partial and non-partial subpaths, then consider
>> * another Parallel Append path which will have *all* partial subpaths.
>> * If enable_parallelappend is off, make this one non-parallel-aware.
>> */
>> if (partial_subpaths_valid && !pa_all_partial_subpaths)
>> ......
>>
>> Use this condition :
>> if (partial_subpaths_valid && pa_nonpartial_subpaths != NIL)
>> ......
>>
>
> Sounds good to me.

Did this. Here is the new condition I used along with the comments
explaining it :

+ * If parallel append has not been added above, or the added
one has a mix
+ * of partial and non-partial subpaths, then consider another Parallel
+ * Append path which will have *all* partial subpaths. We can add such a
+ * path only if all childrels have partial paths in the first
place. This
+ * new path will be parallel-aware unless enable_parallelappend is off.
*/
- if (partial_subpaths_valid && !pa_all_partial_subpaths)
+ if (partial_subpaths_valid &&
+ (!pa_subpaths_valid || pa_nonpartial_subpaths != NIL))

Also added some test scenarios.

On 6 October 2017 at 12:03, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> On 6 October 2017 at 08:49, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>
>> One minor point:
>>
>> + if (!node->as_padesc)
>> + {
>> + /*
>> + */
>> + if (!exec_append_seq_next(node))
>> + return ExecClearTuple(node->ps.ps_ResultTupleSlot);
>> + }
>>
>> It seems either you want to add a comment in above part of patch or
>> you just left /**/ mistakenly.
>
> Oops. Yeah, the comment wrapper remained there when I moved its
> content "This is Parallel-aware append. Follow it's own logic ..." out
> of the if block.

Removed the comment wrapper.

Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
ParallelAppend_v17.patch application/octet-stream 60.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-10-11 13:28:34 Re: Is it time to kill support for very old servers?
Previous Message johannes graën 2017-10-11 12:19:08 Re: performance drop after upgrade (9.6 > 10)