Re: A problem about partitionwise join

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: A problem about partitionwise join
Date: 2024-03-19 02:47:49
Message-ID: CAMbWs48zxu0H7nb08W2KjuM5Pt3tMwX8FPnMH=GSoEDNHQykdQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(Sorry it takes me some time to get back to this thread.)

On Thu, Mar 7, 2024 at 7:13 PM Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
wrote:

> I did a deeper review of the patch. Here are some comments
>

Thank you for the review!

> Approach
> --------
> The equijoin condition between partition keys doesn't appear in the join's
> restrictilist because of 'best_score' strategy as you explained well in
> [2]. What if we add an extra score for clauses between partition keys and
> give preference to equijoin between partition keys? Have you given it a
> thought? I feel that having an equijoin clause involving partition keys has
> more usages compared to a clause with any random column. E.g. nextloop may
> be able to prune partitions from inner relation if the clause contains a
> partition key.
>

Hmm, I think this approach won't work in cases where one certain pair of
partition keys has formed an EC that contains pseudoconstants. In such
cases, the EC machinery will generate restriction clauses like 'pk =
const' rather than any join clauses.

Besides, it seems to me that it's not a cheap operation to check whether
a join clause is between partition keys when we generate join clauses
from ECs in generate_join_implied_equalities().

> Documentation
> -------------
> The patch does not modify any documentation. The only documentation I
> could find about partitionwise join is the one for GUC
> 'enable_partitionwise_join'. It says
> --- quote
> "Partitionwise join currently applies only when the join conditions
> include all the partition keys, which must be of the same data type and
> have one-to-one matching sets of child partitions.".
> --- unquote
> This sentence is general and IMO covers the case this patch considers. But
> in general I feel that partitionwise join and aggregation deserve separate
> sections next to "partition pruning" in [1]; It should mention advanced
> partition matching algorithm as well. Would you be willing to write one and
> then expand it for the case in the patch?
>

I don't think it should be part of this patch to add a new section in
the docs to explain partitionwise join and aggregation. Maybe that
deserves a separate patch?

> Tests
> -----
> The patch adds a testcase for single column partitioning. I think we need
> to do better like
> 1. Test for partitioning on expression, multilevel partitioning, advanced
> partition matching. Those all might just work. Having tests helps us to
> notice any future breakage.
> 2. Some negative test case e.g. equijoin clauses with disjunction, with
> inequality operator, equality operators with operators from different
> families etc.
>

Thanks for the suggestions. We can do that.

> 3. The testcase added looks artificial. it outputs data that has same
> value for two columns which is equal to the primary key of the other table
> - when would somebody do that?. Is there some real life example where this
> change will be useful?
>

Hmm, I think the test case is good as long as it reveals the issue that
this patch fixes. It follows the same format as the existing test case
just above it. I'm not sure if there are real life examples, but I
think it may not always be necessary to derive test cases from them.

> Code
> ----
> Minor comment for now. It will be better to increment num_equal_pks
> immediately after setting pk_known_equal[ipk] = true. Otherwise the code
> gets confusing around line 2269. I will spend more time reviewing the code
> next week.
>

Hmm, the increment of num_equal_pks on line 2272 is parallel to the one
in the first loop (around line 2200). Maybe it's better to keep them
consistent as the current patch does?

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ajin Cherian 2024-03-19 02:49:57 Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding
Previous Message Nathan Bossart 2024-03-19 02:03:41 Re: add AVX2 support to simd.h