Re: Partition-wise join for join between (declaratively) partitioned tables

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-08-08 08:51:02
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

On Thu, Aug 3, 2017 at 7:01 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Jul 31, 2017 at 9:07 AM, Ashutosh Bapat
> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>> Forgot the patch set. Here it is.
> The commit message for 0005 isn't really accurate given that it
> follows 0004. I think you could just flatten 0005 and 0006 into one
> patch.

Earlier, there was some doubt about the approach for expanding
multi-level partitioned table's inheritance hierarchy. So, I had
separated all multi-level partition related changes into patches by
themselves, collocating them with their respective single level
partition peers. I thought that would make the reviews easier while
leaving the possibility of committing single-level partition-wise
support before multi-level partition-wise join support. From your
previous replies, it seems that you are fine with the multi-level
partitioned hierarchy expansion, so it may be committed along-with
other patches. So, I have squashed those two patches together.
Similarly I have squashed pairs 0008-0009 and 0012-0013. Those dealt
with similar issues for single-level partitioned and multi-level
partitioned tables.

> Reviewing those together:
> - Existing code does partdesc = RelationGetPartitionDesc(relation) but
> this has got it as part_desc. Seems better to be consistent.
> Likewise existing variables for PartitionKey are key or partkey, not
> part_key.


> - get_relation_partition_info has a useless trailing return.


> - Instead of adding nparts, boundinfo, and part_oids to RelOptInfo,
> how about just adding partdesc? Seems cleaner.

nparts and boundinfo apply to any kind of relation simple, join or
upper but part_oids applies only to simple relations. So, I have split
those members and added them in respective sections. Do you still
think that we should add PartitionDesc as a single member?

Similar to your suggestion of changing name of part_key to partkey,
should we rename part_scheme as partscheme, part_rels as partrels and
part_oids as partoids?

> - pkexprs seems like a potentially confusing name, since PK is widely
> used to mean "primary key" but here you mean "partition key". Maybe
> partkeyexprs.

agreed. Done. PartitionKey structure has member partexprs for
partition keys which are expressions. I have used the same name
instread of pkexprs.

> - build_simple_rel's matching algorithm is O(n^2). We may have talked
> about this problem before...

If root->append_rel_list has AppendRelInfos in the same order as the
partition bounds, we could reduce this to O(n). That expansion option
is being discussed in [1]. Once we commit it, I will change the code
to make it O(n). Right now, we can not rely on the order of
AppendRelInfos in root->append_rel_list.

> - This patch introduces some bits that are not yet used, like
> nullable_pkexprs,

We could fix that by adding that member in 0008. IIRC, earlier you had
complained about declaring a structure in one patch and adding members
to it in the subsequent patches, so I just added all members in the
same patch. BTW, I have renamed that member to nullable_partexprs to
be consistent with change to pkexpers.

> or even the code to set the partition scheme for
> joinrels. I think perhaps some of that logic should be moved from
> 0008 to here - e.g. the initial portion of
> build_joinrel_partition_info.

Setting part_scheme for joinrel should really be part of the patch
which actually implements partition-wise join. That will keep all the
partition-wise join implementation together. 0005 and 0006 really just
introduce PartitionScheme for base relation. I think PartitionScheme
and other partitioning properties for base relation are useful for
something else like partition-wise aggregation on base relation. So,
we may want to commit those two patches separately. If you want, we
could squash the partition scheme and partition-wise join
implementation together.


Updated patches attached.
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
pg_dp_join_patches_v24.tar.gz application/x-gzip 149.4 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-08-08 09:00:51 Re: free space % calculation in pgstathashindex
Previous Message Craig Ringer 2017-08-08 08:00:43 Re: WIP: Failover Slots