Re: Partition-wise join for join between (declaratively) partitioned tables

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-03-20 13:44:16
Message-ID: CAFjFpRcbY2QN3cfeMTzVEoyF5Lfku-ijyNR=PbXj1e=9a=qMoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 20, 2017 at 8:21 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Mar 17, 2017 at 8:10 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> While I was studying what you did with reparameterize_path_by_child(),
>> I started to wonder whether reparameterize_path() doesn't need to
>> start handling join paths. I think it only handles scan paths right
>> now because that's the only thing that can appear under an appendrel
>> created by inheritance expansion, but you're changing that. Maybe
>> it's not critical -- I think the worst consequences of missing some
>> handling there is that we won't consider a parameterized path in some
>> case where it would be advantageous to do so. Still, you might want
>> to investigate a bit.
>
> I spent a fair amount of time this weekend musing over
> reparameterize_path_by_child(). I think a key question for this patch
> - as you already pointed out - is whether we're happy with that
> approach. When we discover that we want to perform a partitionwise
> parameterized nestloop, and therefore that we need the paths for each
> inner appendrel to get their input values from the corresponding outer
> appendrel members rather than from the outer parent, we've got two
> choices. The first is to do what the patch actually does, which is to
> build a new path tree for the nestloop inner path parameterized by the
> appropriate childrel. The second is to use the existing paths, which
> are parameterized by the parent rel, and then somehow allow make that
> work. For example, you can imagine that create_plan_recurse() could
> pass down a list of parameterized nestloops above the current point in
> the path tree, and a parent-child mapping for each, and then we could
> try to substitute everything while actually generating the plan
> instead of creating paths sooner. Which is better?
>
> It would be nice to hear opinions from anyone else who cares, but
> after some thought I think the approach you've picked is probably
> better, because it's more like what we do already. We have existing
> precedent for reparameterizing a path, but none for allowing a Var for
> one relation (the parent) to in effect refer to another relation (the
> child).

Right. If we could use parent Vars to indicate parent Var or child Var
depending upon the context, a lot of memory issues would be solved; we
wouldn't need to translate a single expression. But I think that's not
straight forward. I have been thinking about some kind of polymorphic
Var node, but it seems a lot more invasive change. Although, if we
could get something like that, we would save a huge memory. :)

>
> That having been said, having try_nestloop_path() perform the
> reparameterization at the very top of the function seems quite
> undesirable. You're creating a new path there before you know whether
> it's going to be rejected by the invalid-parameterization test and
> also before you know whether initial_cost_nestloop is going to reject
> it. It would be much better if you could find a way to postpone the
> reparameterization until after those steps, and only do it if you're
> going to try add_path().

Hmm. I think we can do that by refactoring
calc_nestloop_required_outer(), allow_star_schema_join() and
have_dangerous_phv() to use relids instead of paths. If the checks
pass for a join between parents, those should pass for joins between
children. Done in the attached set of patches.

try_nestloop_path has few new variables. Among those innerrelids and
outerrelids indicate the relids to be used by the parameterization
checks (see patch for details). They are not relids of inner and outer
relations resp. but kind of effective relids to be used. But I
couldn't come up with better names which convey proper meaning and
still are short enough. effective_innerrelids is mouthful.

I am wondering whether we need to change
calc_non_nestloop_required_outer() similar to
calc_nestloop_required_outer() just to keep their signatures in sync.

Should I work on completing reparamterized_path_by_child() to support
all kinds of paths?

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
pg_dp_join_patches_v8.zip application/zip 56.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stas Kelvich 2017-03-20 13:47:50 Re: logical decoding of two-phase transactions
Previous Message Ashutosh Bapat 2017-03-20 13:44:05 Re: Partition-wise join for join between (declaratively) partitioned tables