Re: Partition-wise join for join between (declaratively) partitioned tables

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-03-20 13:44:05
Message-ID: CAFjFpRd1VJrRCa80cELEGsj6yZF-YGkmPZhW_ZyJi5wu3w1xRQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 18, 2017 at 5:40 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Mar 17, 2017 at 9:15 AM, Ashutosh Bapat
> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>> This set of patches fixes both of those things.
>
> 0001 changes the purpose of a function and then 0007 renames it. It
> would be better to include the renaming in 0001 so that you're not
> taking multiple whacks at the same function in the same patch series.

adjust_relid_set was renamed as adjust_child_relids() post
"extern"alising. I think, this comment is about that function. Done.

> I believe it would also be best to include 0011's changes to
> adjust_appendrel_attrs_multilevel in 0001.
>

The function needs to repeat the "adjustment" process for every
"other" relation (join or base) that it encounters, by testing using
OTHER_BASE_REL or OTHER_JOINREL in short IS_OTHER_REL(). The last
macros are added by the partition-wise join implementation patch 0005.
It doesn't make sense to add that macro in 0001 OR modify that
function twice, once in 0001 and then after 0005. So, I will leave it
to be part of 0011, where the changes are actually needed.

> 0002 should either add find_param_path_info() to the relevant header
> file as extern from the beginning, or it should declare and define it
> as static and then 0007 can remove those markings. It makes no sense
> to declare it as extern but put the prototype in the .c file.

Done, added find_param_path_info() as an extern definition to start
with. I have also squashed 0001 and 0002 together, since they are both
refactoring patches and from your next mail about
reparameterize_path_by_child(), it seems that we are going to accept
the approach in that patch.

>
> 0004 still needs to be pared down. If you want to get something
> committed this release cycle, you have to get these details taken care
> of, uh, more or less immediately. Actually, preferably, several weeks
> ago. You're welcome to maintain your own test suite locally but what
> you submit should be what you are proposing for commit -- or if not,
> then you should separate the part proposed for commit and the part
> included for dev testing into two different patches.
>

Done. Now SQL file has 325 lines and output has 1697 lines as against
515 and 4085 lines resp. earlier.

> In 0005's README, the part about planning partition-wise joins in two
> phases needs to be removed.

Done.

> This patch also contains a small change
> to partition_join.sql that belongs in 0004.

The reason I added the test patch prior to implementation was 1. for
me to make sure the tests that the queries run without the
optimization and the results they produce to catch any issues with
partitioning implementation. That would help someone looking at those
patches as well. 2. Once partitioning implementation patch was
applied, once could see the purpose of changes in two follow on
patches. Now that that purpose has served, I have reordered the
patches so that test patch comes after the implementation and follow
on fixes. If you still want to run the test before or after any of
those patches, you could apply the patch separately.

>
> 0008 removes direct tests against RELOPT_JOINREL almost everywhere,
> but it overlooks the new ones added to postgres_fdw.c by
> b30fb56b07a885f3476fe05920249f4832ca8da5. It should be updated to
> cover those as well, I suspect.

Done.

deparseSubqueryTargetList() and some other functions are excluding
"other" base relation from the assertions. I guess, that's a problem.
Will submit a separate patch to fix this.

> The commit message claims that it
> will "Similarly replace RELOPT_OTHER_MEMBER_REL test with
> IS_OTHER_REL() where we want to test for child relations of all kinds,
> but in fact it makes exactly zero such substitutions.

The relevant changes have been covered by other commits. Removed this
line from the commit message.

>
> While I was studying what you did with reparameterize_path_by_child(),
> I started to wonder whether reparameterize_path() doesn't need to
> start handling join paths. I think it only handles scan paths right
> now because that's the only thing that can appear under an appendrel
> created by inheritance expansion, but you're changing that. Maybe
> it's not critical -- I think the worst consequences of missing some
> handling there is that we won't consider a parameterized path in some
> case where it would be advantageous to do so. Still, you might want
> to investigate a bit.

Yes, we need to update reparameterize_path() for child-joins. A path
for child base relation gets reparameterized, if there exists a path
with that parameterization in at least one other child. The
parameterization bubbles up the join tree from base relations. So, if
a child required to be reparameterized, probably all its joins require
reparameterization, since that parameterization would bubble up the
child-join tree in which some other child participates. But as you
said it's an optimization and not a correctness issue. The function
get_cheapest_parameterized_child_path() returns NULL, if it can not
find or create a path (by reparameterization) with required
parameterization. Its caller add_paths_to_append_rel() is capable of
handling NULL values by not creating append paths with that
paramterization. If the "append" relation requires minimum
parameterization, all its children will create that minimum
parameterization, hence do not require to reparameterize path. So,
there isn't any correctness issue there.

There are two ways to fix it,

1. when we create a reparameterized path add it to the list of paths,
thus the parameterization bubbles up the join tree. But then we will
be changing the path list after set_cheapest() has been called OR may
be throwing out paths which other paths refer to. That's not
desirable. May be we can save this path in another list and create
join paths using this path instead of reparameterizing existing join
paths.
2. Add code to reparameterize_path() to handle join paths, and I think
all kinds of paths since we might have trickle the parameterization
down the joining paths which could be almost anything including
sort_paths, unique_paths etc. That looks like a significant effort. I
think, we should attack it separately after the stock partition-wise
join has been committed.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2017-03-20 13:44:16 Re: Partition-wise join for join between (declaratively) partitioned tables
Previous Message Craig Ringer 2017-03-20 13:39:23 Re: logical decoding of two-phase transactions