Re: Proposal : Parallel Merge Join

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal : Parallel Merge Join
Date: 2017-02-26 06:31:59
Message-ID: CA+TgmobdW2au1Jq5L4ySA2ZhqFmA-qNvD7ZFaZbJWm3c0ysWyw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 24, 2017 at 3:54 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> I agree in some cases it could be better, but I think benefits are not
> completely clear, so probably we can leave it as of now and if later
> any one comes with a clear use case or can see the benefits of such
> path then we can include it.

TBH, I think Dilip had the right idea here. cheapest_total_inner
isn't anything magical; it's just that there's no reason to use
anything but the cheapest path for a relation when forming a join path
unless that first path lacks some property that you need, like having
the right sortkeys or being parallel-safe. Since there are lots of
join paths that just want to do things in the cheapest way possible,
we identify the cheapest path and hang on to it; when that's not what
we need, we go find the path or paths that have the properties we
want. I'm not sure why this case should be an exception. You could
argue that if the cheapest parallel-safe path is already more
expensive then the parallel join may not pay off, but that's hard to
say; it depends on what happens higher up in the plan tree. That's
why the current code handles partial hash joins this way:

/*
* Normally, given that the joinrel is parallel-safe, the cheapest
* total inner path will also be parallel-safe, but if not, we'll
* have to search cheapest_parameterized_paths for the cheapest
* safe, unparameterized inner path. If doing JOIN_UNIQUE_INNER,
* we can't use any alternative inner path.
*/
if (cheapest_total_inner->parallel_safe)
cheapest_safe_inner = cheapest_total_inner;
else if (save_jointype != JOIN_UNIQUE_INNER)
{
ListCell *lc;

foreach(lc, innerrel->cheapest_parameterized_paths)
{
Path *innerpath = (Path *) lfirst(lc);

if (innerpath->parallel_safe &&
bms_is_empty(PATH_REQ_OUTER(innerpath)))
{
cheapest_safe_inner = innerpath;
break;
}
}
}

I would tend to think this case ought to be handled in a similar way.
And if not, then we ought to go change the hash join logic too so that
they match.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-02-26 06:41:41 Re: Making clausesel.c Smarter
Previous Message Robert Haas 2017-02-26 06:22:24 Re: Documentation improvements for partitioning