Re: [HACKERS] advanced partition matching algorithm for partition-wise join

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] advanced partition matching algorithm for partition-wise join
Date: 2018-02-15 09:11:20
Message-ID: c164d86f-22b5-4ac4-47da-fcaa6914a43c@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hi Ashutosh.

On 2018/02/09 14:27, Ashutosh Bapat wrote:
> Here's updated patch set with those comments added.

I looked at patches 0002 and 0003.

In 0002:

+ * In case of hash partition we store modulus and remainder in datums array

In case of hash partitioned table?

+ * which has the same data type irrespective of the number of partition keys
+ * and their data types. Hence we can compare the hash bound collection
without
+ * any partition key specific information.

"has the same data type" sounds like it means a Postgres data type,
whereas I think you mean that they are simple int32 values, so we don't
need any PartitionKey information to compare them.

In 0003:

A portion of code in both partition_range_bounds_merge(),
partition_list_bounds_merge(), and merge_null_partitions() has an extra
semi-colon at the end of a line starting with else if:

if (default_index < 0)
default_index = merged_index;
else if(default_index != merged_index);
{

which emits warnings like this:

partition.c: In function ‘partition_range_bounds_merge’:
partition.c:4192:11: warning: this ‘if’ clause does not guard...
[-Wmisleading-indentation]
else if(default_index != merged_index);

^~
partition.c: In function ‘partition_list_bounds_merge’:
partition.c:4261:11: warning: this ‘if’ clause does not guard...
[-Wmisleading-indentation]
else if(default_index != merged_index);
^~
Also, get this warning.

partition.c:3955:1: warning: ‘is_next_range_continuous’ defined but not
used [-Wunused-function]

I'm trying to understand the optimizer side of this patch. In your commit
message, I read:

This commit allows partition-wise join to be applied under
following conditions

1. the partition bounds of joining relations are such that rows from
given partition on one side join can join with rows from maximum one
partition on the other side i.e. bounds of a given partition on one
side match/overlap with those of maximum one partition on the other
side. If the mapping happens to be m:n where m > 1 or n > 1, we have
to gang multiple partition relations together into a single relation.
This means that we have to add simple relations during join
processing, something which is not supported right now. ALso, in such
a case, different pairs of joining relations can produce different
partition bounds for the same join relation, which again is not
supported right now.

So, is the currently allowed case (partition bounds on both sides match
exactly) a special case of this new feature which tries to match
partitions in a more generalized manner? I see that this patch removes
the partition_bound_equal(outer_rel->boundinfo, inner_rel->boundinfo)
check in build_joinrel_partition_info() in favor of reconciling any
differences in the representation of the partition bounds by calling
partition_bounds_merge() from try_partition_wise_join().

2. For every partition on outer side that can contribute to the result
of an OUTER side, there exists at least one (taken along with item 1,
it means exactly one) matching partition on the inner side. To
support partition-wise join when the inner matching partition doesn't
exist, we have to add a dummy base relation corresponding to the
non-existent inner partition. We don't have support add base relations
during join processing.

Sorry but I'm a bit confused by the last sentence; does it mean we're not
able to allow partition-wise join to happen in this case? But this is in
the list of the new cases that the patch makes partition-wise join to
happen for.

Looking at the code changes under src/backend/optimizer:

+ else
+ {
+ Assert(partition_bounds_equal(part_scheme->partnatts,
+ part_scheme->parttyplen,
+ part_scheme->parttypbyval,
+ join_boundinfo, joinrel->boundinfo));

IIUC, this else block would run when try_partition_wise_join() is called
again for the same pair of relations.

+ /*
+ * Every pair of joining relations should result in the same number
+ * of child-joins.
+ */

Sorry if I'm misreading this, but does it mean: a given pair of joining
relations should always result in the same number of (set of, too?)
child-joins?

In the new comment in build_joinrel_partition_info():

+ * Because of restrictions in partition_bounds_merge(), not every pair of
+ * joining relation

joining relations

I will try to hop into partition_bounds_merge() from now...

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anthony Bykov 2018-02-15 09:53:13 Re: Transform for pl/perl
Previous Message Konstantin Knizhnik 2018-02-15 08:59:46 Re: JIT compiling with LLVM v10.1