Re: Proposal : Parallel Merge Join

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal : Parallel Merge Join
Date: 2017-02-14 11:52:42
Message-ID: CAFiTN-tX3EzDw7zYvi97eNADG9PH-nmhLa24Y3uWdzy_Y4SkfQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 14, 2017 at 12:25 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> Few review comments on the latest version of the patch:
> 1.
> - if (joinrel->consider_parallel && nestjoinOK &&
> - save_jointype != JOIN_UNIQUE_OUTER &&
> - bms_is_empty(joinrel->lateral_relids))
> + if (outerrel->partial_pathlist == NIL ||
> + !joinrel->consider_parallel ||
> + save_jointype == JOIN_UNIQUE_OUTER ||
> + !bms_is_empty(joinrel->lateral_relids) ||
> + jointype == JOIN_FULL ||
> + jointype == JOIN_RIGHT)
> + return;
>
>
> For the matter of consistency, I think the above checks should be in
> same order as they are in function hash_inner_and_outer(). I wonder
> why are you using jointype in above check instead of save_jointype, it
> seems to me we can use save_jointype for all the cases.

Done

>
> 2.
> + generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
> +
> jointype, extra, false,
> +
> inner_cheapest_total, merge_pathkeys,
> +
> true);
>
> IIUC, the reason of passing a 7th parameter (useallclauses) as false
> is that it can be true only for Right and Full joins and for both we
> don't generate partial merge join paths. If so, then I think it is
> better to add a comment about such an assumption, so that we don't
> forget to update this code in future if we need to useallclauses for
> something else. Another way to deal with this is that you can pass
> the value of useallclauses to consider_parallel_mergejoin() and then
> to generate_mergejoin_paths(). I feel second way is better, but if
> for some reason you don't find it appropriate then at least add a
> comment.

After fixing 3rd comments this will not be needed.
>
> 3.
> generate_mergejoin_paths()
> {

>
> The above and similar changes in generate_mergejoin_paths() doesn't
> look good and in some cases when innerpath is *parallel-unsafe*, you
> need to perform some extra computation which is not required. How
> about writing a separate function generate_partial_mergejoin_paths()?
> I think you can save some extra computation and it will look neat. I
> understand that there will be some code duplication, but not sure if
> there is any other better way.

Okay, I have done as suggested.

Apart from this, there was one small problem in an earlier version
which I have corrected in this.

+ /* Consider only parallel safe inner path */
+ if (innerpath != NULL &&
+ innerpath->parallel_safe &&
+ (cheapest_total_inner == NULL ||
+ cheapest_total_inner->parallel_safe == false ||
+ compare_path_costs(innerpath, cheapest_total_inner,
+ TOTAL_COST) < 0))

In this comparison, we were only checking if innerpath is cheaper than
the cheapest_total_inner then generate path with this new inner path
as well. Now, I have added one more check if cheapest_total_inner was
not parallel safe then also consider a path with this new inner
(provided this inner is parallel safe).

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
parallel_mergejoin_v5.patch application/octet-stream 13.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2017-02-14 12:00:16 Re: Should we cacheline align PGXACT?
Previous Message Simon Riggs 2017-02-14 11:52:12 Re: Measuring replay lag