Re: Partition-wise join for join between (declaratively) partitioned tables

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-02-06 10:57:18
Message-ID: CAFjFpRePKH95_mTbEn2nj_CHv431+=4OE=1fDuS7O2MvLR_d7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

sent the previous mail before completing my reply. Sorry for that.
Here's the rest of the reply.

>>
>> + if (bms_num_members(outer_relids) > 1)
>>
>> Seems like bms_get_singleton_member could be used.
>>
>> + * Partitioning scheme in join relation indicates a possibilty that the
>>
>> Spelling.

Will take care of this.

>>
>> There seems to be no reason for create_partition_plan to be separated
>> from create_plan_recurse. You can just add another case for the new
>> path type.

Will take care of this.

>>
>> Why does create_partition_join_path need to be separate from
>> create_partition_join_path_with_pathkeys? Couldn't that be combined
>> into a single function with a pathkeys argument that might sometimes
>> be NIL? I assume most of the logic is common.

Agreed. will take care of this.

>>
>> From a sort of theoretical standpoint, the biggest danger of this
>> patch seems to be that by deferring path creation until a later stage
>> than normal, we could miss some important processing.
>> subquery_planner() does a lot of stuff after
>> expand_inherited_tables(); if any of those things, especially the ones
>> that happen AFTER path generation, have an effect on the paths, then
>> this code needs to compensate for those changes somehow. It seems
>> like having the planning of unsampled children get deferred until
>> create_plan() time is awfully surprising; here we are creating the
>> plan and suddenly what used to be a straightforward path->plan
>> translation is running around doing major planning work. I can't
>> entirely justify it, but I somehow have a feeling that work ought to
>> be moved earlier. Not sure exactly where.

I agree with this. Probably we should add a path tree mutator before
SS_identify_outer_params() to replace any Partition*Paths with
Merge/Append paths. The mutator will create paths for child-joins
within temporary memory context, copy the relevant paths and create
Merge/Append paths. There are two problems there 1. We have to write
code to copy paths; most of the paths would be flat copy but custom
scan paths might have some unexpected problems. 2. There will be many
surviving PartitionPaths, and all the corresponding child paths would
need copying and consume memory. In order to reduce that consumption,
we have run this mutator after set_cheapest() in subquery_planner();
but then nothing interesting happens between that and create_plan().
Expanding PartitionPaths during create_plan() does not need any path
copying and we expand only the PartitionPaths which will be converted
to plans. That does save a lot of memory; the reason why we defer
creating paths for child-joins.

>>
>> This is not really a full review, mostly because I can't easily figure
>> out the motivation for all of the changes the patch makes. It makes a
>> lot of changes in a lot of places, and it's not really very easy to
>> understand why those changes are necessary. My comments above about
>> splitting the patch into a series of patches that can potentially be
>> reviewed and applied independently, with the main patch being the last
>> in the series, are a suggestion as to how to tackle that. There might
>> be some work that needs to or could be done on the comments, too. For
>> example, the patch splits out add_paths_to_append_rel from
>> set_append_rel_pathlist, but the comments don't say anything helpful
>> like "we need to do X after Y, because Z". They just say that we do
>> it. To some extent I think the comments in the optimizer have that
>> problem generally, so it's not entirely the fault of this patch;
>> still, the lack of those explanations makes the code reorganization
>> harder to follow, and might confuse future patch authors, too.

Specifically about add_paths_to_append_rel(), what do you expect the
comment to say? It would be obvious why we split that functionality
into a separate function: in fact, we don't necessarily explain why
certain code resides in a separate function in the comments. I think,
that particular comment (or for that matter other such comments in the
optimizer) can be removed altogether, since it just writes the
function names as an "English" sentence. I sometimes find those
comments useful, because I can read just those comments and forget
about the code, making comprehension easy. If highlighting is ON, your
brain habitually ignores the non-comment portions when required. I am
open to suggestions.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2017-02-06 11:03:36 Re: Possible spelling fixes
Previous Message Nikita Glukhov 2017-02-06 10:55:27 Re: [PATCH] kNN for btree