Re: Partition-wise join for join between (declaratively) partitioned tables

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-04-04 14:22:01
Message-ID: CAFjFpRcPdAcUghMNFZuwKpD3Uq-Kp8QNLe9u-912DJqHbyK04A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Tue, Apr 4, 2017 at 2:37 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Thu, Mar 30, 2017 at 1:14 AM, Ashutosh Bapat
> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> > Done.
>
> Ashutosh and I spent several hours discussing this patch set today.
> I'm starting to become concerned about the fact that 0004 makes the
> partition bounds part of the PartitionScheme, because that means you
> can't do a partition-wise join between two tables that have any
> difference at all in the partition bounds. It might be possible in
> the future to introduce a notion of a compatible partition scheme, so
> that you could say, OK, well, these two partition schemes are not
> quite the same, but they are compatible, and we'll make a new
> partition scheme for whatever results from reconciling them.
>
> What I think *may* be better is to consider the partition bound
> information as a property of the RelOptInfo rather than the
> PartitionScheme. For example, suppose we're joining A with partitions
> A1, A2, and A4 against B with partitions B1, B2, and B3 and C with
> partitions C1, C2, and C5. With the current approach, we end up with
> a PartitionScheme for each baserel and, not in this patch set but
> maybe eventually, a separate PartitionScheme for each of (A B), (A C),
> (B C), and (A B C). That seems pretty unsatisfying. If we consider
> the PartitionScheme to only include the question of whether we're
> doing a join on the partition keys, then if the join includes WHERE
> a.key = b.key AND b.key = c.key, we can say that they all have the
> same PartitionScheme up front. Then, each RelOptInfo can have a
> separate list of bounds, like this:
>
> A: 1, 2, 4
> B: 1, 2, 3
> C: 1, 2, 5
> A B: 1, 2, 3, 4
> A C: 1, 2, 4, 5
> B C: 1, 2, 3, 5
> A B C: 1, 2, 3, 4, 5
>
> Or if it's an inner join, then instead of taking the union at each
> level, we can take the intersection, because any partition without a
> match on the other side of the join, then that partition can't produce
> any rows and doesn't need to be scanned. In that case, the
> RelOptInfos for (A B), (A C), (B, C), and (A, B, C) will all end up
> with a bound list of 1, 2.
>

I have separated partition bounds from partition scheme. The patch adds
build_joinrel_partition_bounds() to calculate the bounds of the join
relation and the pairs of matching partitions from the joining relation.
For now the function just check whether both the relations have same bounds
and returns the bounds of the first one. But in future, we will expand this
function to merge partition bounds from the joining relation and return
pairs of matching partitions which when joined form the partitions of the
join according to the merged partition bounds.

Also, moved the code to collect partition RelOptInfos from
set_append_rel_size() to build_simple_rel(), so everything related to
partitioning gets set in that function for a base relation.

I think, we should rename partition scheme as PartitionKeyOptInfo and club
partition bounds, nparts and part_rels as PartitionDescOptInfo. But I
haven't done that in this patch yet.

>
> A related question (that I did not discuss with Ashutosh, but occurred
> to me later) is whether the PartitionScheme ought to worry about
> cross-type joins. For instance, if A is partitioned on an int4 column
> and B is partitioned on an int8 column, and they are joined on their
> respective partitioning columns, can't we still do a partition-wise
> join? We do need to require that the operator family of the operator
> actually used in the query, the operator family used to partition the
> inner table, and the operator family used to partition the other table
> all match; and the collation used for the comparison in the query, the
> collation used to partition the outer table, and the collation used to
> partition the inner table must all match. But it doesn't seem
> necessary to require an exact type or typmod match. In many ways this
> seems a whole lot like the what we test when building equivalence
> classes (cf. process_equivalence) although I'm not sure that we can
> leverage that in any useful way.
>
>
Yes, I agree. For an inner join, the partition key types need to "shrink"
and for outer join they need to be "widened". I don't know if there is a
way to know "wider" or "shorter" of two given types. We might have to
implement a method to merge partition keys to produce partition key of the
join, which may be different from either of the partition keys. So,
after-all we may have to abandon the idea of canonical partition scheme. I
haven't included this change in the attached set of patches.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
pg_dp_join_patches_v17.zip application/zip 76.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2017-04-04 14:22:44 Re: tuplesort_gettuple_common() and *should_free argument
Previous Message Joe Conway 2017-04-04 14:19:43 Re: partitioned tables and contrib/sepgsql