Re: Proposal : Parallel Merge Join

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal : Parallel Merge Join
Date: 2017-03-07 10:16:31
Message-ID: CAFiTN-szCEcZrQm0i_w4xqSaRUTOUFstNu32Zn4rxxDcoa8gnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 7, 2017 at 5:21 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> + /* Can't do anything else if inner path needs to be unique'd */
> + if (save_jointype == JOIN_UNIQUE_INNER)
> + return;
>
> Right after this, you should try_partial_mergejoin_path() with the
> result of get_cheapest_parallel_safe_total_inner(), just like
> sort_inner_and_outer() already does.
>
> I think with some work you could merge generate_mergejoin_paths with
> generate_partial_mergejoin_paths instead of duplicating all the code.
> They're not really that different, and you could reduce the
> differences further if you made try_mergejoin_path() take an
> additional is_partial argument and pass the call to
> try_partial_mergejoin_path() if it's true. The various bits of
> special handling related to join types that aren't supported in
> parallel queries aren't going to help you in parallel mode, but they
> won't hurt either. This bit:
>
> + /* Generate partial path if inner is parallel safe. */
> + if (inner_cheapest_total->parallel_safe)
> + try_partial_mergejoin_path(root,
>
> ...is really not right. That value is being passed down from
> consider_parallel_mergejoin(), which is getting it from
> match_unsorted_outer(), which really shouldn't be calling this code in
> the first place with a path that's not parallel-safe. What it ought
> to do is (a) pass inner_cheapest_total if that's non-NULL and
> parallel-safe, or else (b) call
> get_cheapest_parallel_safe_total_inner() and pass that value, (c)
> unless that's NULL also in which case it should skip the call
> altogether.

I have tried to address above comments in the latest patch, But there
is one part of the patch which I am not quite sure yet. So still this
patch is not a final one.

+ *
+ * If inner_cheapest_total is not NULL or not a parallel safe path then
+ * find the cheapest total parallel safe path.
+ */
+ if (inner_cheapest_total == NULL ||
+ inner_cheapest_total->parallel_safe == false)
+ {
+ inner_cheapest_total =
+ get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
+ }
+
+ if (inner_cheapest_total)
+ consider_parallel_mergejoin(root, joinrel, outerrel, innerrel,
+ save_jointype, extra,
+ inner_cheapest_total);

I am confused about whether to call
"get_cheapest_parallel_safe_total_inner" with
innerrel->cheapest_parameterized_paths like we do in case of
hash_inner_and_outer or with
innerrel->pathlist. The reason behind I am calling this with complete
pathlist (innerrel->pathlist) is that inside generate_mergejoin_paths
it calls "get_cheapest_path_for_pathkeys" for complete pathlist and if
we decide not to call consider_parallel_mergejoin if the cheapest
parallel safe path in innerrel->cheapest_parameterized_paths is NULL
then it will not be fair (we might have some parallel safe paths in
innerrel->pathlist).

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
parallel_mergejoin_v10.patch application/octet-stream 13.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2017-03-07 10:18:11 Re: SCRAM authentication, take three
Previous Message Oleg Bartunov 2017-03-07 09:21:59 Re: SQL/JSON in PostgreSQL