Quick Links

Re: Partition-wise join for join between (declaratively) partitioned tables

From:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partition-wise join for join between (declaratively) partitioned tables
Date:	2017-05-18 08:38:40
Message-ID:	CAFjFpReKnTXPkX68B+vdGC8F0eJ50TYqOzUte02=G8KF5x3A0g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Apr 6, 2017 at 6:37 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
>> There's a relevant comment in 0006, build_joinrel_partition_info()
>> (probably that name needs to change, but I will do that once we have
>> settled on design)
>> + /*
>> + * Construct partition keys for the join.
>> + *
>> + * An INNER join between two partitioned relations is partition by key
>> + * expressions from both the relations. For tables A and B
>> partitioned by a and b
>> + * respectively, (A INNER JOIN B ON A.a = B.b) is partitioned by both A.a
>> + * and B.b.
>> + *
>> + * An OUTER join like (A LEFT JOIN B ON A.a = B.b) may produce rows with
>> + * B.b NULL. These rows may not fit the partitioning conditions imposed on
>> + * B.b. Hence, strictly speaking, the join is not partitioned by B.b.
>> + * Strictly speaking, partition keys of an OUTER join should include
>> + * partition key expressions from the OUTER side only. Consider a join like
>> + * (A LEFT JOIN B on (A.a = B.b) LEFT JOIN C ON B.b = C.c. If we do not
>> + * include B.b as partition key expression for (AB), it prohibits us from
>> + * using partition-wise join when joining (AB) with C as there is no
>> + * equi-join between partition keys of joining relations. But two NULL
>> + * values are never equal and no two rows from mis-matching partitions can
>> + * join. Hence it's safe to include B.b as partition key expression for
>> + * (AB), even though rows in (AB) are not strictly partitioned by B.b.
>> + */
>>
>> I think that also needs to be reviewed carefully.
>
> The following passage from src/backend/optimizer/README seems highly relevant:
>
> ===
> The planner's treatment of outer join reordering is based on the following
> identities:
>
> 1. (A leftjoin B on (Pab)) innerjoin C on (Pac)
> = (A innerjoin C on (Pac)) leftjoin B on (Pab)
>
> where Pac is a predicate referencing A and C, etc (in this case, clearly
> Pac cannot reference B, or the transformation is nonsensical).
>
> 2. (A leftjoin B on (Pab)) leftjoin C on (Pac)
> = (A leftjoin C on (Pac)) leftjoin B on (Pab)
>
> 3. (A leftjoin B on (Pab)) leftjoin C on (Pbc)
> = A leftjoin (B leftjoin C on (Pbc)) on (Pab)
>
> Identity 3 only holds if predicate Pbc must fail for all-null B rows
> (that is, Pbc is strict for at least one column of B). If Pbc is not
> strict, the first form might produce some rows with nonnull C columns
> where the second form would make those entries null.
> ===
>
> In other words, I think your statement that null is never equal to
> null is a bit imprecise. Somebody could certainly create an operator
> that is named "=" which returns true in that case, and then they could
> say, hey, two nulls are equal (when you use that operator). The
> argument needs to be made in terms of the formal properties of the
> operator.

[.. some portion clipped .. ]

> The relevant logic is in have_partkey_equi_join:
>
> + /* Skip clauses which are not equality conditions. */
> + if (rinfo->hashjoinoperator == InvalidOid &&
> !rinfo->mergeopfamilies)
> + continue;
>
> Actually, I think the hashjoinoperator test is formally and
> practically unnecessary here; lower down there is a test to see
> whether the partitioning scheme's operator family is a member of
> rinfo->mergeopfamilies, which will certainly fail if we got through
> this test with rinfo->mergeopfamilies == NIL just on the strength of
> rinfo->hashjoinoperator != InvalidOid. So you can just bail out if
> rinfo->mergeopfamilies == NIL. But the underlying point here is that
> the only thing you really know about the function is that it's got to
> be a strategy-3 operator in some btree opclass; if that guarantees
> strictness, then so be it -- but I wasn't able to find anything in the
> code or documentation off-hand that supports that contention, so we
> might need to think a bit more about why (or if) this is guaranteed to
> be true.
>
>> Partition-wise joins
>> may be happy including partition keys from all sides, but
>> partition-wise aggregates may not be, esp. when pushing complete
>> aggregation down to partitions. In that case, rows with NULL partition
>> key, which falls on nullable side of join, will be spread across
>> multiple partitions. Proabably, we should separate nullable and
>> non-nullable partition key expressions.
>
> I don't think I understand quite what you're getting at here. Can you
> spell this out in more detail? To push an aggregate down to
> partitions, you need the grouping key to match the applicable
> partition key, and the partition key shouldn't allow nulls in more
> than one place. Now I think your point may be that outer join
> semantics could let them creep in there, e.g. SELECT b.x, sum(a.y)
> FROM a LEFT JOIN b ON a.x = b.x GROUP BY 1 -- which would indeed be a
> good test case for partitionwise aggregate. I'd be inclined to think
> that we should just give up on partitionwise aggregate in such cases;
> it's not worth trying to optimize such a weird query, at least IMHO.
> (Does this sort of case ever happen with joins? I think not, as long
> as the join operator is strict.)
>

I am revisiting NULL equality in the context of merging partition
bounds. In [1] paragraphs following

--
Do not write expression = NULL because NULL is not “equal to” NULL.
(The null value represents an unknown value, and it is not known
whether two unknown values are equal.)

seem to indicate that an equality operator should never return true
for two NULL values since it would never know whether two NULL
(unknown) values are same or not. In a paragraph above, Robert stated
that

> In other words, I think your statement that null is never equal to
> null is a bit imprecise. Somebody could certainly create an operator
> that is named "=" which returns true in that case, and then they could
> say, hey, two nulls are equal (when you use that operator). The
> argument needs to be made in terms of the formal properties of the
> operator.

But in case a user has written an = operator which returns true for
two NULL values, per description in [1], that comparison operator is
flawed and
using that operator is going to result in SQL-standard-incompliant
behaviour. I have tried to preserve all the relevant portions of
discussion in this mail. Am I missing something?

[1] https://www.postgresql.org/docs/devel/static/functions-comparison.html

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Re: Partition-wise join for join between (declaratively) partitioned tables at 2017-04-06 01:07:11 from Robert Haas

Responses

Re: Partition-wise join for join between (declaratively) partitioned tables at 2017-05-19 14:16:22 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Konstantin Knizhnik	2017-05-18 08:57:57	Re: Cached plans and statement generalization
Previous Message	David Rowley	2017-05-18 08:28:56	Regression in join selectivity estimations when using foreign keys