Re: Proposal : Parallel Merge Join

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal : Parallel Merge Join
Date: 2017-01-04 06:32:21
Message-ID: CAA4eK1LNeu8s_LygVdPBWqAur_e0q8YHGQucP5pqx2Mg6qoHHg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Wed, Dec 21, 2016 at 9:23 PM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> On Wed, Dec 21, 2016 at 8:39 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Committed the refactoring patch with some mild cosmetic adjustments.
>
> Thanks..
>>
>> As to the second patch:
>>
>> + if (jointype == JOIN_UNIQUE_INNER)
>> + jointype = JOIN_INNER;
>>
>> Isn't this dead code? save_jointype might that value, but jointype won't.
>
> Yes, it is.
>
> I have fixed this, and also observed that comment for
> try_partial_mergejoin_path header was having some problem, fixed that
> too.
>

Review comments:
1.
+ bool is_partial);
+

Seems additional new line is not required.

2.
+ * try_partial_mergejoin_path
+ * Consider a partial merge join path; if it appears useful,
push it into
+ * the joinrel's pathlist via add_path().
+ */

I think in above comment, you should say ".. joinrel's
partial_pathlist via add_partial_path()"

3.
+ /*
+ * See comments in try_partial_nestloop_path().
+ */
+ Assert(bms_is_empty
(joinrel->lateral_relids));
+ if (inner_path->param_info != NULL)
+ {
+ Relids
inner_paramrels = inner_path->param_info->ppi_req_outer;
+
+ if (!bms_is_subset
(inner_paramrels, outer_path->parent->relids))
+ return;
+ }

This same code exists in try_partial_nestloop_path() and
try_partial_hashjoin_path() with minor difference of code in
try_partial_hashjoin_path(). Can't we write a separate inline
function for this code and call from all the three places.

4.
+ /*
+ * See comments in try_nestloop_path().
+ */
+ initial_cost_mergejoin(root,
&workspace, jointype, mergeclauses,
+ outer_path,
inner_path,
+ outersortkeys, innersortkeys,
+
extra->sjinfo);

I think in above comments, you want to say try_partial_nestloop_path().

5.
- if (joinrel->consider_parallel && nestjoinOK &&
- save_jointype != JOIN_UNIQUE_OUTER &&
- bms_is_empty(joinrel->lateral_relids))
+ if (!joinrel->consider_parallel ||
+ save_jointype == JOIN_UNIQUE_OUTER ||
+ !bms_is_empty(joinrel->lateral_relids) ||
+ jointype == JOIN_FULL ||
+ jointype == JOIN_RIGHT)
+ return;
+
+ if (nestjoinOK)

Here, we only want to create partial paths when
outerrel->partial_pathlist is not NILL, so I think you can include
that check as well. Another point to note is that in HEAD, we are not
checking for jointype as JOIN_RIGHT or JOIN_FULL for considering
parallel nestloop paths, whereas with your patch, it will do those
checks, is it a bug of HEAD or is there any other reason for including
those checks for parallel nestloop paths?

6.
+ /* Can't generate mergejoin path if inner rel is parameterized by outer */
+ if (inner_cheapest_total != NULL)
+ {
+ ListCell *lc1;
+
+ /* generate merge join path for each partial outer path */
+ foreach(lc1, outerrel->partial_pathlist)
+ {
+ Path *outerpath = (Path *) lfirst(lc1);
+ List *merge_pathkeys;
+
+ /*
+ * Figure out what useful ordering any paths we create
+ * will have.
+ */
+ merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
+ outerpath->pathkeys);
+
+ generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
+ save_jointype, extra, false,
+ inner_cheapest_total, merge_pathkeys,
+ true);
+ }
+
+ }

Won't it be better if you encapsulate the above chunk of code in
function consider_parallel_merjejoin() similar to
consider_parallel_nestloop()? I think that way code will look
cleaner.

7.
+ /*
+ * Figure out what useful ordering any paths we create
+ * will have.
+ */
+ merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
+ outerpath->pathkeys);

indentation problem with variable outerpath->pathkeys.

8.
- try_mergejoin_path(root,
- joinrel,
- outerpath,
- inner_cheapest_total,
- merge_pathkeys,
- mergeclauses,
- NIL,
- innersortkeys,
- jointype,
- extra);
+ if (!is_partial)
+ try_mergejoin_path(root,
+ joinrel,
+ outerpath,
+ inner_cheapest_total,
+ merge_pathkeys,
+ mergeclauses,
+ NIL,
+ innersortkeys,
+ jointype,
+ extra);
+
+ /* Generate partial path if inner is parallel safe. */
+ else if (inner_cheapest_total->parallel_safe)
+ try_partial_mergejoin_path(root,
+ joinrel,
+ outerpath,
+ inner_cheapest_total,
+ merge_pathkeys,
+ mergeclauses,
+ NIL,
+ innersortkeys,
+ jointype,
+ extra);

In above code indentation is broken, similarly, there is another code
in a patch where it is broken, try using pgindent.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-01-04 06:46:54 Re: Potential data loss of 2PC files
Previous Message Abhijit Menon-Sen 2017-01-04 06:12:42 Re: [COMMITTERS] pgsql: Update copyright for 2017