Re: Partition-wise join for join between (declaratively) partitioned tables

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-09-18 21:05:43
Message-ID: CA+TgmoaCG72nMadCmAj7XPNhaeashj7TDjcvvQO6F4My8pQjgg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 18, 2017 at 8:02 AM, Ashutosh Bapat
<ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> partition pruning might need partexprs look up relevant quals, but
> nullable_partexprs doesn't have any use there. So may be we should add
> nullable_partexpr to RelOptInfo as part of 0002 (partition-wise join
> implementation) instead of 0001. What do you think?

+1.

>> - I'm not entirely sure whether maintaining partexprs and
>> nullable_partexprs is the right design. If I understand correctly,
>> whether or not a partexpr is nullable is really a per-RTI property,
>> not a per-expression property. You could consider something like
>> "Relids nullable_rels".
>
> That's true. However in order to decide whether an expression falls on
> nullable side of a join, we will need to call pull_varnos() on it and
> check the output against nullable_rels. Separating the expressions
> themselves avoids that step.

Good point. Also, I'm not sure about cases like this:

SELECT * FROM (SELECT b.x, b.y FROM a LEFT JOIN b ON a.x = b.x WHERE
a.y = b.y) w LEFT JOIN c ON w.x = c.x AND w.y = c.y;

Suppose the relations are all partitioned by (x, y) but that the =
operator is not strict. A partition-wise join is valid between a and
b, but we can't regard w as partitioned any more, because w.x might
contain nulls in partitions where the partitioning scheme wouldn't
allow them. On the other hand, if the subquery were to select a.x,
a.y then clearly it would be fine: there would be no possibility of a
NULL having been substituted for a proper value.

What if the subquery selected a.x, b.y? Initially, I thought that
would be OK too, because of the fact that the a.y = b.y clause is in
the WHERE clause rather than the join condition. But on further
thought I think that probably doesn't work, because with = being a
non-strict operator there's no guarantee that it would remove any
nulls introduced by the left join. Of course, if the subselect had a
WHERE clause saying that b.x/b.y IS NOT NULL then having the SELECT
list mention those columns would be fine.

>> - The naming of enable_partition_wise_join might also need some
>> thought. What happens when we also have partition-wise aggregate?
>> What about the proposal to strength-reduce MergeAppend to Append --
>> would that use this infrastructure? I wonder if we out to call this
>> enable_partition_wise or enable_partition_wise_planning to make it a
>> bit more general. Then, too, I've never really liked having
>> partition_wise in the GUC name because it might make someone think
>> that it makes you partitions have a lot of wisdom. Removing the
>> underscore might help: partitionwise. Or maybe there is some whole
>> different name that would be better. If anyone wants to bikeshed,
>> now's the time.
>
> partitions having a lot of wisdom would be wise_partitions rather than
> partition_wise ;).

Well, maybe it's the joins that have a lot of wisdom, then.
enable_partition_wise_join could be read to mean that we should allow
partitioning of joins, but only if those joins know the secret of true
happiness.

> If partition-wise join is disabled, partition-wise aggregates,
> strength reduction of MergeAppend won't be possible on a join tree,
> but those will be possible on a base relation. Even if partition-wise
> join enabled, one may want to disable other partition-wise
> optimizations individually. So, they are somewhat independent
> switches. I don't think we should bundle all of those into one.
> Whatever names we choose for those GUCs, I think they should have same
> naming convention e.g. "partition_wise_xyz". I am open to suggestions
> about the names.

I think the chances of you getting multiple GUCs for different
partition-wise optimizations past Tom are pretty low.

>> - Instead of reorganizing add_paths_to_append_rel as you did, could
>> you just add an RTE_JOIN case to the switch? Not sure if there's some
>> problem with that idea, but it seems like it might come out nicer.
>
> RTE_JOIN is created only for joins specified using JOIN clause i.e
> syntactic joins. The joins created during query planner like rel1,
> rel2, rel3 do not have RTE_JOIN. So, we can't use RTE_JOIN there.

OK, never mind that then.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-09-18 21:07:58 Re: Boom filters for hash joins (was: A design for amcheck heapam verification)
Previous Message Tom Lane 2017-09-18 20:39:59 Re: Small code improvement for btree