Re: Partition-wise join for join between (declaratively) partitioned tables

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2016-11-11 12:50:15
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

> So, I am thinking about your approach of creating PartitionJoinPaths
> without actually creating child paths and then at a later stage
> actually plan the child joins. Here's rough sketch of how that may be
> done.
> At the time of creating regular paths, we identify the join orders
> which can use partition-wise join and save those in the RelOptInfo of
> the parent table. If no such join order exists, we do not create
> PartitionJoinPaths for that relation. Otherwise, once we have
> considered all the join orders i.e. in
> generate_partition_wise_join_paths(), we create one PartitionJoinPath
> for every path that has survived in the parent or at least for every
> path that has distinct properties like pathkeys or parameterisation,
> with those properties.
> At the time of creating plans, if PartitionJoinPath is chosen, we
> actually create paths for every partition of that relation
> recursively. The path creation logic is carried out in a different
> memory context. Amongst the paths that survive, we choose the best
> path that has the same properties as PartitionJoinPath. We would
> expect all parameterized paths to be retained and any unparameterized
> path can be sorted to match the pathkeys of reference
> PartitionJoinPath. We then create the plan out of this path and copy
> it into the outer memory context and release the memory context used
> for path creation. This is similar to how prepared statements save
> their plans. Once we have the plan, the memory consumed by paths won't
> be referenced, and hence can not create problems. At the end we create
> an Append/MergeAppend plan with all the child plans and return it.
> Costing PartitionJoinPath needs more thought so that we don't end up
> with bad overall plans. Here's an idea. Partition-wise joins are
> better compared to the unpartitioned ones, because of the smaller
> sizes of partitions. If we think of join as O(MN) operation where M
> and N are sizes of unpartitioned tables being joined, partition-wise
> join computes P joins each with average O(M/P * N/P) order where P is
> the number of partitions, which is still O(MN) with constant factor
> reduced by P times. I think, we need to apply similar logic to
> costing. Let's say cost of a join is J(M, N) = S (M, N) + R (M, N)
> where S and R are setup cost and joining cost (for M, N rows) resp.
> Cost of partition-wise join would be P * J(M/P, N/P) = P * S(M/P, N/P)
> + P * R(M/P, N/P). Each of the join methods will have different S and
> R functions and may not be linear on the number of rows. So,
> PartitionJoinPath costs are obtained from corresponding regular path
> costs subjected to above transformation. This way, we will be
> protected from choosing a PartitionJoinPath when it's not optimal.
> Take example of a join where the joining relations are very small in
> size, thus hash join on full relation is optimal compared to hash join
> of each partition because of setup cost. In such a case, the function
> which calculates the cost of hash table setup, would result in almost
> same cost for full table as well as each of the partitions, thus
> increasing P * S(M/P, N/P) as compared to S(M, N).
> Let me know your comments.

I tried to measure the impact of having a memory context reset 1000
times (once for each partition) with the attached patch. Without this
patch make check in regress/ takes about 24 seconds on my laptop and
with this patch it takes 26 seconds. This is almost 10% increase in
time. I hope that's fine.

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
memory_context_change.patch text/x-patch 1.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Rushabh Lathia 2016-11-11 12:56:33 Re: Gather Merge
Previous Message Ashutosh Bapat 2016-11-11 12:06:18 Re: [RFC] Should we fix postmaster to avoid slow shutdown?