Re: Proposal : Parallel Merge Join

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal : Parallel Merge Join
Date: 2017-02-14 06:55:18
Message-ID: CAA4eK1+D7TN8PUMjfqHLn+ufUdZgo58OXidnrWmReLT-XqrQew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 4, 2017 at 9:27 PM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> On Wed, Jan 4, 2017 at 12:02 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

>> 3.
>> + /*
>> + * See comments in try_partial_nestloop_path().
>> + */
>>
>> This same code exists in try_partial_nestloop_path() and
>> try_partial_hashjoin_path() with minor difference of code in
>> try_partial_hashjoin_path(). Can't we write a separate inline
>> function for this code and call from all the three places.
>
> It's a small check, is it ok to make it the separate function. I have
> also observed similar kind of duplicate code in try_mergejoin_path and
> try_hashjoin_path. However, if you think that it will be better to
> move that check to an inline function I can submit a seperate patch in
> this thread as a refactoring patch?
>

Let's keep that check unchanged.

Few review comments on the latest version of the patch:
1.
- if (joinrel->consider_parallel && nestjoinOK &&
- save_jointype != JOIN_UNIQUE_OUTER &&
- bms_is_empty(joinrel->lateral_relids))
+ if (outerrel->partial_pathlist == NIL ||
+ !joinrel->consider_parallel ||
+ save_jointype == JOIN_UNIQUE_OUTER ||
+ !bms_is_empty(joinrel->lateral_relids) ||
+ jointype == JOIN_FULL ||
+ jointype == JOIN_RIGHT)
+ return;

For the matter of consistency, I think the above checks should be in
same order as they are in function hash_inner_and_outer(). I wonder
why are you using jointype in above check instead of save_jointype, it
seems to me we can use save_jointype for all the cases.

2.
+ generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
+
jointype, extra, false,
+
inner_cheapest_total, merge_pathkeys,
+
true);

IIUC, the reason of passing a 7th parameter (useallclauses) as false
is that it can be true only for Right and Full joins and for both we
don't generate partial merge join paths. If so, then I think it is
better to add a comment about such an assumption, so that we don't
forget to update this code in future if we need to useallclauses for
something else. Another way to deal with this is that you can pass
the value of useallclauses to consider_parallel_mergejoin() and then
to generate_mergejoin_paths(). I feel second way is better, but if
for some reason you don't find it appropriate then at least add a
comment.

3.
generate_mergejoin_paths()
{
..
..
- try_mergejoin_path(root,
- joinrel,
- outerpath,
- innerpath,
- merge_pathkeys,
- newclauses,
- NIL,
- NIL,
- jointype,
- extra);

+ if (!is_partial)
+ try_mergejoin_path(root,
+ joinrel,
+ outerpath,
+ innerpath,
+ merge_pathkeys,
+ newclauses,
+ NIL,
+ NIL,
+ jointype,
+ extra);

+ /* Generate partial path only if innerpath is parallel safe. */
+ else if (innerpath->parallel_safe)
+ try_partial_mergejoin_path(root,
+ joinrel,
+ outerpath,
+ innerpath,
+ merge_pathkeys,
+ newclauses,
+ NIL,
+ NIL,
+ jointype,
+ extra);

..
}

The above and similar changes in generate_mergejoin_paths() doesn't
look good and in some cases when innerpath is *parallel-unsafe*, you
need to perform some extra computation which is not required. How
about writing a separate function generate_partial_mergejoin_paths()?
I think you can save some extra computation and it will look neat. I
understand that there will be some code duplication, but not sure if
there is any other better way.

How about adding one simple test case as I have kept for other
parallel patches like parallel index scan, parallel subplan, etc.?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-02-14 07:07:46 Re: Documentation improvements for partitioning
Previous Message Michael Paquier 2017-02-14 06:45:12 Re: REINDEX CONCURRENTLY 2.0