Re: Proposal : Parallel Merge Join

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal : Parallel Merge Join
Date: 2017-03-06 13:15:48
Message-ID: CAFiTN-uvTVQOgXr5O1wmCdakqpYerYLs=11Fnz2gGr0EtDSNJw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Fri, Mar 3, 2017 at 3:57 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I'm not happy with the way this patch can just happen to latch on to a
> path that's not parallel-safe rather than one that is and then just
> give up on a merge join in that case. I already made this argument in
> https://www.postgresql.org/message-id/CA+TgmobdW2au1Jq5L4ySA2ZhqFmA-qNvD7ZFaZbJWm3c0ysWyw@mail.gmail.com
> and my opinion hasn't changed. I've subsequently come to realize that
> this problem is more widespread that I initially realized: not only do
> sort_inner_and_outer() and generate_partial_mergejoin_paths() give up
> without searching for the cheapest parallel-safe path, but the latter
> later calls get_cheapest_path_for_pathkeys() and then just pretends it
> didn't find anything unless the result is parallel-safe. This makes
> no sense to me: the cheapest parallel-safe path could be 1% more
> expensive than the cheapest path of any kind, but because it's not the
> cheapest we arbitrarily skip it, even though parallelizing more of the
> tree might leave us way ahead overall.

for sort_inner_and_outer I have followed the same logic what
hash_inner_and_outer is doing. Also moved the logic of selecting the
cheapest_safe_inner out of the pathkey loop.

>
> I suggest that we add an additional "bool require_parallel_safe"
> argument to get_cheapest_path_for_pathkeys(); when false, the function
> will behave as at present, but when true, it will skip paths that are
> not parallel-safe. And similarly have a function to find the cheapest
> parallel-safe path if the first one in the list isn't it. See
> attached draft patch. Then this code can search for the correct path
> instead of searching using the wrong criteria and then giving up if it
> doesn't get the result it wants.

Done as suggested. Also, rebased the path_search_for_mergejoin on
head and updated the comments of get_cheapest_path_for_pathkeys for
new argument.

>
> + if (!(joinrel->consider_parallel &&
> + save_jointype != JOIN_UNIQUE_OUTER &&
> + save_jointype != JOIN_FULL &&
> + save_jointype != JOIN_RIGHT &&
> + outerrel->partial_pathlist != NIL &&
> + bms_is_empty(joinrel->lateral_relids)))
>
> I would distribute the negation, so that this reads if
> (!joinrel->consider_parallel || save_jointype == JOIN_UNIQUE_OUTER ||
> save_jointype == JOIN_FULL || ...). Or maybe better yet, just drop
> the ! and the return that follows and put the
> consider_parallel_nestloop and consider_parallel_mergejoin calls
> inside the if-block.

Done
>
> You could Assert(nestjoinOK) instead of testing it, although I guess
> it doesn't really matter.

left as it is.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
parallel_mergejoin_v9.patch application/octet-stream 16.0 KB
path-search-for-mergejoin-v2.patch application/octet-stream 5.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-03-06 13:17:59 Re: PATCH: Configurable file mode mask
Previous Message Giuseppe Broccolo 2017-03-06 12:49:24 dump a comment of a TSDictionary