Re: Partition-wise join for join between (declaratively) partitioned tables

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-04-18 10:55:17
Message-ID: CAFjFpRcsVW8HcsXXkz+_ZFwEYRb7XwrXV63T4cuyyTgxO8-xEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 5, 2017 at 8:39 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Apr 4, 2017 at 10:22 AM, Ashutosh Bapat
> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>> Yes, I agree. For an inner join, the partition key types need to "shrink"
>> and for outer join they need to be "widened". I don't know if there is a way
>> to know "wider" or "shorter" of two given types. We might have to implement
>> a method to merge partition keys to produce partition key of the join, which
>> may be different from either of the partition keys. So, after-all we may
>> have to abandon the idea of canonical partition scheme. I haven't included
>> this change in the attached set of patches.
>
> I think this is why you need to regard the partitioning scheme as
> something more like an equivalence class - possibly the partitioning
> scheme should actually contain (or be?) an equivalence class. Suppose
> this is the query:
>
> SELECT * FROM i4 INNER JOIN i8 ON i4.x = i8.x;
>
> ...where i4 (x) is an int4 partitioning key and i8 (x) is an int8
> partitioning key. It's meaningless to ask whether the result of the
> join is partitioned by int4 or int8. It's partitioned by the
> equivalence class that contains both i4.x and i8.x. If the result of
> this join where joined to another table on either of those two
> columns, a second partition-wise join would be theoretically possible.
> If you insist on knowing the type of the partitioning scheme, rather
> than just the opfamily, you've boxed yourself into a corner from which
> there's no good escape.

When we merge partition bounds from two relations with different
partition key types, the merged partition bounds need to have some
information abound the way those constants look like e.g. their
length, structure etc. That's the reason we need to store partition
key types of merged partitioning scheme. Consider a three way join (i4
JOIN i8 ON i4.x = i8.x) JOIN i2 ON (i2.x = i.x). When we compare
partition bounds of i4 and i8, we use operators for int4 and int8. The
join i4 JOIN i8 will get partition bounds by merging those of i4 and
i8. When we come to join with i2, we need to know which operators to
use for comparing the partition bounds of the join with those of i2.

So, if the partition key types of the joining relations differ (but
they have matching partitioning schemes per strategy, natts and
operator family) the partition bounds of the join are converted to the
wider type among the partition key types of the joining tree.
Actually, as I am explained earlier we could choose a wider outer type
for an OUTER join and shorter type for inner join. This type is used
as partition key type of the join. In the above case join between i4
and i8 have its partition bounds converted to i8 (or i4) and then when
it is joined with i2 the partition bounds of the join are converted to
i8 (or i2).

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Maksim Milyutin 2017-04-18 11:13:00 Re: Proposal: Local indexes for partitioned table
Previous Message Masahiko Sawada 2017-04-18 10:55:01 Logical replication and synchronous replication