Re: Proposal : Parallel Merge Join

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal : Parallel Merge Join
Date: 2017-02-24 09:34:11
Message-ID: CAA4eK1K=_qkc67T_M82OxdT7Au9h7WSJUgmLk5Oe88tie0Jx3w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 14, 2017 at 5:22 PM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> On Tue, Feb 14, 2017 at 12:25 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> Apart from this, there was one small problem in an earlier version
> which I have corrected in this.
>
> + /* Consider only parallel safe inner path */
> + if (innerpath != NULL &&
> + innerpath->parallel_safe &&
> + (cheapest_total_inner == NULL ||
> + cheapest_total_inner->parallel_safe == false ||
> + compare_path_costs(innerpath, cheapest_total_inner,
> + TOTAL_COST) < 0))
>
> In this comparison, we were only checking if innerpath is cheaper than
> the cheapest_total_inner then generate path with this new inner path
> as well. Now, I have added one more check if cheapest_total_inner was
> not parallel safe then also consider a path with this new inner
> (provided this inner is parallel safe).
>

What advantage do you see for considering such a path when the cost of
innerpath is more than cheapest_total_inner? Remember the more paths
we try to consider, the more time we spend in the planner. By any
chance are you able to generate any query where it will give benefit
by considering costlier innerpath?

2.
+static void generate_parallel_mergejoin_paths(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *innerrel,
+ Path *outerpath,
+ JoinType jointype,
+ JoinPathExtraData *extra,
+ Path *inner_cheapest_total,
+ List *merge_pathkeys);

It is better to name this function as
generate_partial_mergejoin_paths() as we are generating only partial
paths in this function and accordingly change the comment on top of
the function. I see that you might be naming it based on
consider_parallel_*, however, I think it is better to use partial in
the name as that is what we are doing inside that function. Also, I
think this function has removed/changed some handling related to
unique outer and full joins, so it is better to mention that in the
function comments, something like "unlike above function, this
function doesn't expect to handle join types JOIN_UNIQUE_OUTER or
JOIN_FULL" and add Assert for both of those types.

3.
A test case is still missing.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-02-24 09:53:32 Re: Patch: Write Amplification Reduction Method (WARM)
Previous Message Okano, Naoki 2017-02-24 09:25:29 Re: Keep ECPG comment for log_min_duration_statement