Re: Proposal : Parallel Merge Join

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal : Parallel Merge Join
Date: 2017-03-03 10:27:05
Message-ID: CA+TgmoYOv+dFK0MWW6366dFj_xTnohQfoBDrHyB7d1oZhrgPjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Wed, Mar 1, 2017 at 12:01 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Wed, Mar 1, 2017 at 11:24 AM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>> On Wed, Mar 1, 2017 at 11:13 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>> I think for now we can keep the parallel safety check for cheapest
>>> inner path, though it will be of use only for the very first time we
>>> compare the paths in that loop. I am not sure if there is any other
>>> better way to handle the same.
>>
>> Done that way.
>
> Thanks, your patch looks good to me.

I'm not happy with the way this patch can just happen to latch on to a
path that's not parallel-safe rather than one that is and then just
give up on a merge join in that case. I already made this argument in
https://www.postgresql.org/message-id/CA+TgmobdW2au1Jq5L4ySA2ZhqFmA-qNvD7ZFaZbJWm3c0ysWyw@mail.gmail.com
and my opinion hasn't changed. I've subsequently come to realize that
this problem is more widespread that I initially realized: not only do
sort_inner_and_outer() and generate_partial_mergejoin_paths() give up
without searching for the cheapest parallel-safe path, but the latter
later calls get_cheapest_path_for_pathkeys() and then just pretends it
didn't find anything unless the result is parallel-safe. This makes
no sense to me: the cheapest parallel-safe path could be 1% more
expensive than the cheapest path of any kind, but because it's not the
cheapest we arbitrarily skip it, even though parallelizing more of the
tree might leave us way ahead overall.

I suggest that we add an additional "bool require_parallel_safe"
argument to get_cheapest_path_for_pathkeys(); when false, the function
will behave as at present, but when true, it will skip paths that are
not parallel-safe. And similarly have a function to find the cheapest
parallel-safe path if the first one in the list isn't it. See
attached draft patch. Then this code can search for the correct path
instead of searching using the wrong criteria and then giving up if it
doesn't get the result it wants.

+ if (!(joinrel->consider_parallel &&
+ save_jointype != JOIN_UNIQUE_OUTER &&
+ save_jointype != JOIN_FULL &&
+ save_jointype != JOIN_RIGHT &&
+ outerrel->partial_pathlist != NIL &&
+ bms_is_empty(joinrel->lateral_relids)))

I would distribute the negation, so that this reads if
(!joinrel->consider_parallel || save_jointype == JOIN_UNIQUE_OUTER ||
save_jointype == JOIN_FULL || ...). Or maybe better yet, just drop
the ! and the return that follows and put the
consider_parallel_nestloop and consider_parallel_mergejoin calls
inside the if-block.

You could Assert(nestjoinOK) instead of testing it, although I guess
it doesn't really matter.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
path-search-for-mergejoin-v1.patch application/octet-stream 5.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo Nagata 2017-03-03 11:01:57 Re: [POC] hash partitioning
Previous Message Kyotaro HORIGUCHI 2017-03-03 10:19:30 Re: [BUG FIX] Removing NamedLWLockTrancheArray