Re: Parallel Append implementation

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Append implementation
Date: 2017-10-06 06:33:59
Message-ID: CAJ3gD9cJLNVqDNHAemJXGNcEkJsfv6V39UB+1t0Hv90sEveEoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6 October 2017 at 08:49, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Thu, Oct 5, 2017 at 4:11 PM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>>
>> Ok. How about removing pa_all_partial_subpaths altogether , and
>> instead of the below condition :
>>
>> /*
>> * If all the child rels have partial paths, and if the above Parallel
>> * Append path has a mix of partial and non-partial subpaths, then consider
>> * another Parallel Append path which will have *all* partial subpaths.
>> * If enable_parallelappend is off, make this one non-parallel-aware.
>> */
>> if (partial_subpaths_valid && !pa_all_partial_subpaths)
>> ......
>>
>> Use this condition :
>> if (partial_subpaths_valid && pa_nonpartial_subpaths != NIL)
>> ......
>>
>
> Sounds good to me.
>
> One minor point:
>
> + if (!node->as_padesc)
> + {
> + /*
> + */
> + if (!exec_append_seq_next(node))
> + return ExecClearTuple(node->ps.ps_ResultTupleSlot);
> + }
>
> It seems either you want to add a comment in above part of patch or
> you just left /**/ mistakenly.

Oops. Yeah, the comment wrapper remained there when I moved its
content "This is Parallel-aware append. Follow it's own logic ..." out
of the if block. Since this is too small a change for an updated
patch, I will do this along with any other changes that would be
required as the review progresses.

>
>> ----
>>
>>
>> Regarding a mix of partial and non-partial paths, I feel it always
>> makes sense for the leader to choose the partial path.
>>
>
> Okay, but why not cheapest partial path?

I gave some thought on this point. Overall I feel it does not matter
which partial path it should pick up. RIght now the partial paths are
not ordered. But for non-partial paths sake, we are just choosing the
very last path. So in case of mixed paths, leader will get a partial
path, but that partial path would not be the cheapest path. But if we
also order the partial paths, the same logic would then pick up
cheapest partial path. The question is, should we also order the
partial paths for the leader ?

The only scenario I see where leader choosing cheapest partial path
*might* show some benefit, is if there are some partial paths that
need to do some startup work using only one worker. I think currently,
parallel hash join is one case where it builds the hash table, but I
guess here also, we support parallel hash build, but not sure about
the status. For such plan, if leader starts it, it would be slow, and
no other worker would be able to help it, so its actual startup cost
would be drastically increased. (Another path is parallel bitmap heap
scan where the leader has to do something and the other workers wait.
But here, I think it's not much work for the leader to do). So
overall, to handle such cases, it's better for leader to choose a
cheapest path, or may be, a path with cheapest startup cost. We can
also consider sorting partial paths with decreasing startup cost.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2017-10-06 06:48:13 Re: Another oddity in handling of WCO constraints in postgres_fdw
Previous Message Dilip Kumar 2017-10-06 06:12:35 Re: Proposal: Improve bitmap costing for lossy pages