Re: [HACKERS] advanced partition matching algorithm for partition-wise join

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] advanced partition matching algorithm for partition-wise join
Date: 2018-02-16 05:14:32
Message-ID: CAFjFpRdRA9rHiTiFAHu2b4+v=6t+o3Z-53QR2bWhtfrSjb7UOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 15, 2018 at 2:41 PM, Amit Langote
<Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hi Ashutosh.
>
> On 2018/02/09 14:27, Ashutosh Bapat wrote:
>
> In 0002:
>
> + * In case of hash partition we store modulus and remainder in datums array
>
> In case of hash partitioned table?
>
> + * which has the same data type irrespective of the number of partition keys
> + * and their data types. Hence we can compare the hash bound collection
> without
> + * any partition key specific information.
>
> "has the same data type" sounds like it means a Postgres data type,
> whereas I think you mean that they are simple int32 values, so we don't
> need any PartitionKey information to compare them.

Modified this comment to read
+ * Hash partition bounds store modulus and remainder in datums array which are
+ * always integers irrespective of the number of partition keys and their data
+ * types. Hence we can compare the hash bound collection without any partition
+ * key specific information. Separating this logic in a function which does not
+ * require partition key specific information allows it be called from places
+ * where the partition key specific information is not completely available.

Let me know if that looks good.

>
> In 0003:
>
> A portion of code in both partition_range_bounds_merge(),
> partition_list_bounds_merge(), and merge_null_partitions() has an extra
> semi-colon at the end of a line starting with else if:
>
> if (default_index < 0)
> default_index = merged_index;
> else if(default_index != merged_index);
> {
>
> which emits warnings like this:
>
> partition.c: In function ‘partition_range_bounds_merge’:
> partition.c:4192:11: warning: this ‘if’ clause does not guard...
> [-Wmisleading-indentation]
> else if(default_index != merged_index);
>
> ^~
> partition.c: In function ‘partition_list_bounds_merge’:
> partition.c:4261:11: warning: this ‘if’ clause does not guard...
> [-Wmisleading-indentation]
> else if(default_index != merged_index);
>

Thanks for catching those. They will cause bugs.

> Also, get this warning.
>
> partition.c:3955:1: warning: ‘is_next_range_continuous’ defined but not
> used [-Wunused-function]

I had added that function earlier, but it's no more useful in the
patch. But I have kept it there in case we need it again. I will
remove it in the final patch.

>
> I'm trying to understand the optimizer side of this patch. In your commit
> message, I read:
>
> This commit allows partition-wise join to be applied under
> following conditions
>
> 1. the partition bounds of joining relations are such that rows from
> given partition on one side join can join with rows from maximum one
> partition on the other side i.e. bounds of a given partition on one
> side match/overlap with those of maximum one partition on the other
> side. If the mapping happens to be m:n where m > 1 or n > 1, we have
> to gang multiple partition relations together into a single relation.
> This means that we have to add simple relations during join
> processing, something which is not supported right now. ALso, in such
> a case, different pairs of joining relations can produce different
> partition bounds for the same join relation, which again is not
> supported right now.
>
>
> So, is the currently allowed case (partition bounds on both sides match
> exactly) a special case of this new feature which tries to match
> partitions in a more generalized manner?

Right.

>
> 2. For every partition on outer side that can contribute to the result
> of an OUTER side, there exists at least one (taken along with item 1,
> it means exactly one) matching partition on the inner side. To
> support partition-wise join when the inner matching partition doesn't
> exist, we have to add a dummy base relation corresponding to the
> non-existent inner partition. We don't have support add base relations
> during join processing.
>
> Sorry but I'm a bit confused by the last sentence; does it mean we're not
> able to allow partition-wise join to happen in this case? But this is in
> the list of the new cases that the patch makes partition-wise join to
> happen for.
>

Consider two cases for partitioned tables t1(p1, p3, p4), t2 (p1, p2,
p3). Assume that the bounds of any pk across t1 and t2 are same when k
matches. So, p2 is missing from t1 and p4 from t2. This case will not
be handled by current partition-wise join. With these set of patches,
we will be able to handle some joins between t1 and t2. The matching
algorithm will output pairs as (p1, p1), (-, p2), (p3, p3), (p4, -).
In an inner join between t1 and t2, (-, p2), (p4, -) would not
contribute any result, so we can eliminate those. Thus t1 INNER JOIN
t2 is just (p1, p1), (p3, p3). t1 LEFT JOIN t2 will not have any
result from (-, p2), so we can eliminate it. If there would not have
been p4, we will be able to compute t1 LEFT JOIN t2. Such cases are
not handled right now.

Fixed some grammar and typos in the paragraph.

>
> Looking at the code changes under src/backend/optimizer:
>
> + else
> + {
> + Assert(partition_bounds_equal(part_scheme->partnatts,
> + part_scheme->parttyplen,
> + part_scheme->parttypbyval,
> + join_boundinfo, joinrel->boundinfo));
>
> IIUC, this else block would run when try_partition_wise_join() is called
> again for the same pair of relations.

Yes, it's all Asserts, so effectively, we run nothing in a binary
without asserts.

>
> + /*
> + * Every pair of joining relations should result in the same number
> + * of child-joins.
> + */
>
> Sorry if I'm misreading this, but does it mean: a given pair of joining
> relations should always result in the same number of (set of, too?)
> child-joins?

The assert there just check for number so the comment doesn't mention
set. But you are correct, it's the same set. We check it when we
actually deal with them by checking their relids.

>
> In the new comment in build_joinrel_partition_info():
>
> + * Because of restrictions in partition_bounds_merge(), not every pair of
> + * joining relation
>
> joining relations

Fixed.

>
>
> I will try to hop into partition_bounds_merge() from now...

Appreciate you taking time for review.

PFA updated version.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
pg_adv_dp_join_patches_v5.tar.gz application/x-gzip 149.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-02-16 05:28:53 Re: Removing useless #include's.
Previous Message Kyotaro HORIGUCHI 2018-02-16 05:07:25 Re: reorganizing partitioning code