Re: Proposal : Parallel Merge Join

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal : Parallel Merge Join
Date: 2017-01-04 15:57:30
Message-ID: CAFiTN-u-NX=14XmvCvwexxTsk9wfGdnYgQx=bn38TxH3ZgBFkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 4, 2017 at 12:02 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> Review comments:
> 1.
> + bool is_partial);
> +
>
> Seems additional new line is not required.
Fixed
>
> 2.
> + * try_partial_mergejoin_path
> + * Consider a partial merge join path; if it appears useful,
> push it into
> + * the joinrel's pathlist via add_path().
> + */
>
> I think in above comment, you should say ".. joinrel's
> partial_pathlist via add_partial_path()"
Fixed
>
> 3.
> + /*
> + * See comments in try_partial_nestloop_path().
> + */
>
> This same code exists in try_partial_nestloop_path() and
> try_partial_hashjoin_path() with minor difference of code in
> try_partial_hashjoin_path(). Can't we write a separate inline
> function for this code and call from all the three places.

It's a small check, is it ok to make it the separate function. I have
also observed similar kind of duplicate code in try_mergejoin_path and
try_hashjoin_path. However, if you think that it will be better to
move that check to an inline function I can submit a seperate patch in
this thread as a refactoring patch?

I observed one more issue that in case of partial merge join
inner_paramrels should be empty not subset of outer, so fixed the same
in the attached version. The code in the previous patch will also not
create any problem, because if the inner path is parameterized by
outerrel it will not create any merge join path.

>
> 4.
> + /*
> + * See comments in try_nestloop_path().
> + */
> + initial_cost_mergejoin(root,
> &workspace, jointype, mergeclauses,
> + outer_path,
> inner_path,
> + outersortkeys, innersortkeys,
> +
> extra->sjinfo);
>
> I think in above comments, you want to say try_partial_nestloop_path().

Fixed
> 5.
> - if (joinrel->consider_parallel && nestjoinOK &&
> - save_jointype != JOIN_UNIQUE_OUTER &&
> - bms_is_empty(joinrel->lateral_relids))
> + if (!joinrel->consider_parallel ||
> + save_jointype == JOIN_UNIQUE_OUTER ||
> + !bms_is_empty(joinrel->lateral_relids) ||
> + jointype == JOIN_FULL ||
> + jointype == JOIN_RIGHT)
> + return;
> +
> + if (nestjoinOK)
>
> Here, we only want to create partial paths when
> outerrel->partial_pathlist is not NILL, so I think you can include
> that check as well.
Done.

Another point to note is that in HEAD, we are not
> checking for jointype as JOIN_RIGHT or JOIN_FULL for considering
> parallel nestloop paths, whereas with your patch, it will do those
is al> checks, is it a bug of HEAD or is there any other reason for including
> those checks for parallel nestloop paths?

Actually we can not create parallel join path in case of JOIN_RIGHT or
JOIN_FULL. nestjoinOK is marked false in case of JOIN_RIGHT or
JOIN_FULL. So it was not considered in base code as well. Now we have
mergejoin as well so it's better to keep a common check.
>
> 6.
> + /* Can't generate mergejoin path if inner rel is parameterized by outer */
> + if (inner_cheapest_total != NULL)
> + {
> + ListCell *lc1;
> +
> + /* generate merge join path for each partial outer path */
> + foreach(lc1, outerrel->partial_pathlist)
> + {
> + Path *outerpath = (Path *) lfirst(lc1);
> + List *merge_pathkeys;
> +
> + /*
> + * Figure out what useful ordering any paths we create
> + * will have.
> + */
> + merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
> + outerpath->pathkeys);
> +
> + generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
> + save_jointype, extra, false,
> + inner_cheapest_total, merge_pathkeys,
> + true);
> + }
> +
> + }
>
> Won't it be better if you encapsulate the above chunk of code in
> function consider_parallel_merjejoin() similar to
> consider_parallel_nestloop()? I think that way code will look
> cleaner.

Done
>
> 7.
> + /*
> + * Figure out what useful ordering any paths we create
> + * will have.
> + */
> + merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
> + outerpath->pathkeys);
>
> indentation problem with variable outerpath->pathkeys.

Fixed
>
> 8.
> - try_mergejoin_path(root,
> - joinrel,
> - outerpath,
> - inner_cheapest_total,
> - merge_pathkeys,
> - mergeclauses,
> - NIL,
> - innersortkeys,
> - jointype,
> - extra);
> + if (!is_partial)
> + try_mergejoin_path(root,
> + joinrel,
> + outerpath,
> + inner_cheapest_total,
> + merge_pathkeys,
> + mergeclauses,
> + NIL,
> + innersortkeys,
> + jointype,
> + extra);
> +
> + /* Generate partial path if inner is parallel safe. */
> + else if (inner_cheapest_total->parallel_safe)
> + try_partial_mergejoin_path(root,
> + joinrel,
> + outerpath,
> + inner_cheapest_total,
> + merge_pathkeys,
> + mergeclauses,
> + NIL,
> + innersortkeys,
> + jointype,
> + extra);
>
> In above code indentation is broken, similarly, there is another code
> in a patch where it is broken, try using pgindent.

Fixed with pgindent.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
parallel_mergejoin_v4.patch application/octet-stream 11.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2017-01-04 16:07:06 Re: merging some features from plpgsql2 project
Previous Message Tom Lane 2017-01-04 15:57:20 Re: [PATCH] Reload SSL certificates on SIGHUP