Re: Partition-wise join for join between (declaratively) partitioned tables

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2016-09-09 09:47:17
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

Hi All,

PFA the patch to support partition-wise joins for partitioned tables. The
is based on the declarative parition support patches provided by Amit
on 26th August 2016. The previous patch added support to assess whether two
tables can be joined using partition-wise join technique, but did not have
complete support to create plans which used partition-wise technique. This
patch implements three important pieces for supporting partition-wise join

1. Logic to assess whether join between two partitioned tables can be
using partition-wise join technique.
2. Construct RelOptInfo's representating join between matching partitions of
the joining relations and add join paths to those RelOptInfo's
3. Add append paths to the RelOptInfo representing the join between
tables. Rest of the planner code chooses the optimal path for join.

make_join_rel() now calls try_partition_wise_join(), which executes all of
steps listed above. If the joining partitioned relations are deemed fit for
partition-wise join, we create one RelOptInfo (if not already present)
representing a join between every pair of partitions to be joined. Since the
join between parents is deemed legal, the join between the partitions is
legal, hence legality of the join is not checked again. RelOptInfo
the join between partitions is constructed by translating the relevant
of RelOptInfo of the parent join relation. Similarly SpecialJoinInfo,
restrictlist (for given join order) are constructed by translating those for
the parent join.

make_join_rel() is split into two portions, a. that deals with constructing
restrictlist and RelOptInfo for join relation b. that creates paths for the
join. The second portion is separated into a function
populate_joinrel_with_paths(), which is reused in try_partition_wise_join()
create paths for join between matching partitions.

set_append_rel_pathlist() generates paths for child relations, marks the
children as dummy relations and creates append paths by collecting paths
similar properties (parameterization and pathkeys) from non-empty children.
then adds append paths to the parent relation. This patch divides
set_append_rel_pathlist() into two parts a. marking empty child relations as
dummy and generating paths for non-empty children. b. collecting children
into append paths for parent. Part b is separate into a function
add_paths_to_append_rel() which is reused for collecting paths from
partition-wise join child relations to construct append paths for join
partitioned tables.

For an N-way join between partitioned tables, make_join_rel() is called as
times as the number of valid join orders exist. For each such call, we will
paths to join between partitions for corresponding join order between those
partitions. We can generate the append paths for parent joinrel only after
such join orders have been considered. Hence before setting cheapest path
parent join relation, we set the cheapest path for each join relation
partitions, followed by creating append paths for the parent joinrel. This
method needs some readjustment for multi-level partitions (TODO item 2

A GUC enable_partition_wise_join is added to enable or disable
join technique. I think the GUC is useful similar to other join related GUCs
like enable_hashjoin.

parameterized paths: While creating parameterized paths for child relations
a partitioned tables, we do not have an idea as to whether we will be able
use partition-wise join technique or not. Also we do not know the child
partition of the other partitioned table, to which a given partition would
join. Hence we do not create paths parameterized by child partitions of
partitioned relations. But path for child of a partitioned relation
parameterized by other parent relation, can be considered to be
by any child relation of the other partitioned relation by replacing the
parameters by corresponding child parameters. This condition is used to
eliminate parameterized paths while creating merge and hash joins, to decide
the resultant parameterization of a join between child partitions and to
nested loop paths with inner path parameterized by outer relation where
and outer relations are child partitions. While creating such nest loop join
paths we translate the path parameterized by other parent partitioned
to that parameterized by the required child.

Functions like select_outer_pathkeys_for_merge(), make_sort_from_pathkeys(),
find_ec_member_for_tle() which did not expect to be called for a child
relation, are now used for child partition relations for joins. These
are adjusted for that usage.

I have added partition_join.sql testcase to test partition-wise join
That file has extensive tests for list, range, multi-level partitioning
and various kinds of joins including nested loop join with inner relation
parameterized by outer relationThat file has extensive tests for list,
multi-level partitioning schemes and various kinds of joins including nested
loop join with inner relation parameterized by outer relation.

make check passes clean.


1. Instead of storing partitioning information in RelOptInfo of each of the
partitioned relations (base and join relations), we can keep a list of
canonical partition schemes in PlannerInfo. Every RelOptInfo gets a pointer
the member of list representing the partitioning scheme of corresponding
relation. RelOptInfo's of all similarly partitioned relations get the same
pointer thus making it easy to match the partitioning schemes by comparing
pointers. While we are supporting only exact partition matching scheme now,
it's possible to extend this method to match compatible partitioning
schemes by
maintaining a list of compatible partitioning schemes.

Right now, I have moved some partition related structures from partition.c
partition.h. These structures are still being reviewed and might change when
Amit Langote improves his patches. Having canonical partitioning scheme in
PlannerInfo may not require moving those structures out. So, that code is
under development. A related change is renaming RangeBound structure in Amit
Langote's patches to PartitionRangeBound to avoid name conflict with
rangetypes.h. That change too should vanish once we decide where to keep
structure and its final name.

2. Multi-level partitioned tables: For some reason path created for joining
partitions are not being picked up as the cheapest paths. I think, we need
finalize the lower level paths before moving upwards in the partition
hierarchy. But I am yet to investigate the issue here.
should point to top parents rather than immediate parents.

3. Testing: need more tests for testing partition-wise join with foreign
as partitions. More tests for parameterized joins for multi-level

4. Remove bms_to_char(): I have added this function to print Relids in the
debugger. I have found it very useful to quickly examine Relids in debugger,
which otherwise wasn't so easy. If others find it useful too, I can create a
separate patch to be considered for a separate commit.

5. In add_paths_to_append_rel() to find the possible set of outer relations
generating parameterized paths for a given join. This code needs to be
to eliminate the parent relations possible set of outer relations for a join
between child partitions.

6. Add support to reparameterize more types of paths for child relations. I
will add this once we finalize the method to reparameterize a parent path
child partition.

7. The patch adds make_joinrel() (name needs to be changed because of its
similariy with make_join_rel()) to construct an empty RelOptInfo for a join
between partitions. The function copies code doing the same from
build_join_rel(). build_join_rel() too can use this function, if we decide
retain it.

8. Few small TODOs related to code reorganization, proper function,
variable naming etc. are in the patch. pg_indent run.

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
pg_dp_join.patch text/x-patch 516.9 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2016-09-09 10:22:55 Re: CF app and patch series
Previous Message Craig Ringer 2016-09-09 09:31:39 Re: ICU integration