Re: Partition-wise join for join between (declaratively) partitioned tables

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-04-05 02:50:14
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

Somehow I sent the old patch set again. Here's the real v17.

On Tue, Apr 4, 2017 at 7:52 PM, Ashutosh Bapat
<ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> On Tue, Apr 4, 2017 at 2:37 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Thu, Mar 30, 2017 at 1:14 AM, Ashutosh Bapat
>> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>> > Done.
>> Ashutosh and I spent several hours discussing this patch set today.
>> I'm starting to become concerned about the fact that 0004 makes the
>> partition bounds part of the PartitionScheme, because that means you
>> can't do a partition-wise join between two tables that have any
>> difference at all in the partition bounds. It might be possible in
>> the future to introduce a notion of a compatible partition scheme, so
>> that you could say, OK, well, these two partition schemes are not
>> quite the same, but they are compatible, and we'll make a new
>> partition scheme for whatever results from reconciling them.
>> What I think *may* be better is to consider the partition bound
>> information as a property of the RelOptInfo rather than the
>> PartitionScheme. For example, suppose we're joining A with partitions
>> A1, A2, and A4 against B with partitions B1, B2, and B3 and C with
>> partitions C1, C2, and C5. With the current approach, we end up with
>> a PartitionScheme for each baserel and, not in this patch set but
>> maybe eventually, a separate PartitionScheme for each of (A B), (A C),
>> (B C), and (A B C). That seems pretty unsatisfying. If we consider
>> the PartitionScheme to only include the question of whether we're
>> doing a join on the partition keys, then if the join includes WHERE
>> a.key = b.key AND b.key = c.key, we can say that they all have the
>> same PartitionScheme up front. Then, each RelOptInfo can have a
>> separate list of bounds, like this:
>> A: 1, 2, 4
>> B: 1, 2, 3
>> C: 1, 2, 5
>> A B: 1, 2, 3, 4
>> A C: 1, 2, 4, 5
>> B C: 1, 2, 3, 5
>> A B C: 1, 2, 3, 4, 5
>> Or if it's an inner join, then instead of taking the union at each
>> level, we can take the intersection, because any partition without a
>> match on the other side of the join, then that partition can't produce
>> any rows and doesn't need to be scanned. In that case, the
>> RelOptInfos for (A B), (A C), (B, C), and (A, B, C) will all end up
>> with a bound list of 1, 2.
> I have separated partition bounds from partition scheme. The patch adds
> build_joinrel_partition_bounds() to calculate the bounds of the join
> relation and the pairs of matching partitions from the joining relation. For
> now the function just check whether both the relations have same bounds and
> returns the bounds of the first one. But in future, we will expand this
> function to merge partition bounds from the joining relation and return
> pairs of matching partitions which when joined form the partitions of the
> join according to the merged partition bounds.
> Also, moved the code to collect partition RelOptInfos from
> set_append_rel_size() to build_simple_rel(), so everything related to
> partitioning gets set in that function for a base relation.
> I think, we should rename partition scheme as PartitionKeyOptInfo and club
> partition bounds, nparts and part_rels as PartitionDescOptInfo. But I
> haven't done that in this patch yet.
>> A related question (that I did not discuss with Ashutosh, but occurred
>> to me later) is whether the PartitionScheme ought to worry about
>> cross-type joins. For instance, if A is partitioned on an int4 column
>> and B is partitioned on an int8 column, and they are joined on their
>> respective partitioning columns, can't we still do a partition-wise
>> join? We do need to require that the operator family of the operator
>> actually used in the query, the operator family used to partition the
>> inner table, and the operator family used to partition the other table
>> all match; and the collation used for the comparison in the query, the
>> collation used to partition the outer table, and the collation used to
>> partition the inner table must all match. But it doesn't seem
>> necessary to require an exact type or typmod match. In many ways this
>> seems a whole lot like the what we test when building equivalence
>> classes (cf. process_equivalence) although I'm not sure that we can
>> leverage that in any useful way.
> Yes, I agree. For an inner join, the partition key types need to "shrink"
> and for outer join they need to be "widened". I don't know if there is a way
> to know "wider" or "shorter" of two given types. We might have to implement
> a method to merge partition keys to produce partition key of the join, which
> may be different from either of the partition keys. So, after-all we may
> have to abandon the idea of canonical partition scheme. I haven't included
> this change in the attached set of patches.
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size application/zip 63.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2017-04-05 02:53:51 Re: multivariate statistics (v25)
Previous Message Amit Kapila 2017-04-05 02:47:42 Re: increasing the default WAL segment size