Quick Links

Re: Partition-wise join for join between (declaratively) partitioned tables

From:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partition-wise join for join between (declaratively) partitioned tables
Date:	2017-04-28 05:18:57
Message-ID:	CAFjFpRdGBcMq=MgXxVs-nzuGA7Df8Yo9NJdOED7vmDo9+fmdFA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Apr 28, 2017 at 1:32 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Apr 27, 2017 at 3:41 AM, Ashutosh Bapat
> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>> The third goal requires that the partition bounds be compared based on
>> the partition keys present in the equi-join. While matching the
>> partitions to be joined, the partition bounds corresponding the base
>> relation whose partition keys appear in the equi-join are used for
>> comparison using support function corresponding to the data types of
>> partition keys. This requires us to associate the partitions of a join
>> with the bounds of base relation. E.g. A(A1, A2) with bounds (X1, X3)
>> (notice missing X2), B (B1, B2) bounds (X1, X2), C (C1, C2, C3) bounds
>> (X1, X2, X3) and the join is A LJ B on A.a = B.b LJ C on B.b = C.c
>> assuming strict operators this can be executed as (AB)C or A(BC). AB
>> will have partitions A1B1, A2B3 since there is no matching bound of A
>> for B2 and A is outer relation. A1B1 is associated with bound X1 of A
>> and C both. A2B3 is associated with bound of X3, which happens to be
>> 2nd bound of A but third of B. When we join (AB) with C, we should
>> notice that C1 goes with A1B1, C2 doesn't have any matching partition
>> in AB and C3 goes with A2B3. If we compare bounds of B with C without
>> any transformation we will know C2 matches B2, but we need to look at
>> the children of AB to realize that B2 isn't present in any of the
>> children and thus C2 should not be joined with any partition of AB.
>
> Sure.
>
>> That usually looks a quadratic order operation on the number of
>> partitions.
>
> Now that I don't buy. Certainly, for range partitions, given a list
> of ranges of length M and another of length N, this can be done in
> O(M+N) time by merge-joining the lists of bounds. You pointed out
> upthread that for list partitions, things are a bit complicated
> because a single list partition can contain multiple values which are
> not necessarily contiguous, but I think that this can still be done in
> O(M+N) time. Sort all of the bounds, associating each one to a
> partition, and do a merge pass; whenever two bounds match, match the
> two corresponding partitions, but if one of those partitions is
> already matched to some other partition, then fail.
>
> For example, consider A1 FOR VALUES IN (1,3,5), A2 FOR VALUES IN
> (2,4,6), B1 FOR VALUES IN (1,6), B2 FOR VALUES IN (2,4). The sorted
> bounds for A are 1,2,3,4,5,6; for B, 1,2,4,6. The first value in both
> lists is a 1, so the corresponding partitions A1 and B1 are matched.
> The second value in both lists is a 2, so the corresponding partitions
> A2 and B2 are matched. Then we hit a 3 on the A side that has no
> match on the B side, but that's fine; we don't need to do anything.
> If the partition on the A side never got a mapping at any point during
> this merge pass, we'd eventually need to match it to a dummy partition
> (unless this is an inner join) but it's already mapped to B1 so no
> problem. Then we hit a 4 which says that A2 must match B2, which is
> consistent with what we already determine; no problem. Then we hit
> another value that only exists on the A side, which is fine just as
> before. Finally we hit a 6 on each side, which means that A2 must
> match B1, which is inconsistent with the existing mappings so we give
> up; no partitionwise join is possible here.

For two-way join this works and is fairly straight-forward. I am
assuming that A an B are base relations and not joins. But making it
work for N-way join is the challenge. I don't see your example
describing that. But I think, given your revised position below, we
don't need to get this right at this point. Remember, that the
paragraph was about 3rd goal, which according to your revised position
is now deferred.

>
> Having said that I think we could make this work, I'm starting to
> agree with you that it will add more complexity than it's worth.
> Needing to keep track of the type of every partition bound
> individually seems like a real nuisance, and it's not likely to win
> very often because, realistically, people should and generally will
> use the same type for the partitioning column in all of the relevant
> tables. So I'm going to revise my position and say it's fine to just
> give up on partitionwise join unless the types match exactly, but I
> still think we should try to cover the cases where the bounds don't
> match exactly but only 1:1 or 1:0 or 0:1 mappings are needed (iow,
> optimizations 1 and 2 from your list of 4). I agree that ganging
> partitions (optimization 4 from your list) is not something to tackle
> right now.

Good. I will have a more enjoyable vacation now.

Do you still want the patition key type to be out of partition scheme?
Keeping it there means we match it only once and save it only at a
single place. Otherwise, it will have to be stored in RelOptInfo of
the partitioned table and match it for every pair of joining
relations.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Re: Partition-wise join for join between (declaratively) partitioned tables at 2017-04-27 20:02:52 from Robert Haas

Responses

Re: Partition-wise join for join between (declaratively) partitioned tables at 2017-04-28 19:07:02 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jeevan Chalke	2017-04-28 05:28:30	Re: Partition-wise aggregation/grouping
Previous Message	Thomas Munro	2017-04-28 05:10:42	Re: [COMMITTERS] pgsql: Replication lag tracking for walsenders