Re: A problem about partitionwise join

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: A problem about partitionwise join
Date: 2024-03-07 11:13:05
Message-ID: CAExHW5v=5igWC8BzsxmMJeMnr98pmZZBB3zF5XxHN5BEeGNe-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 22, 2024 at 2:56 PM Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
wrote:

> On Wed, Feb 21, 2024 at 4:55 PM Richard Guo <guofenglinux(at)gmail(dot)com>
> wrote:
> >
> >
> > On Tue, Aug 2, 2022 at 4:24 AM Jacob Champion <jchampion(at)timescale(dot)com>
> wrote:
> >>
> >> Once you think you've built up some community support and the patchset
> >> is ready for review, you (or any interested party) can resurrect the
> >> patch entry by visiting
> >>
> >> https://commitfest.postgresql.org/38/2266/
> >>
> >> and changing the status to "Needs Review", and then changing the
> >> status again to "Move to next CF". (Don't forget the second step;
> >> hopefully we will have streamlined this in the near future!)
> >
> >
> > This patch was returned due to 'lack of interest'. However, upon
> > verification, it appears that the reported issue still exists, and the
> > proposed fix in the thread remains valid. Hence, resurrect this patch
> > after rebasing it on master. I've also written a detailed commit
> > message which hopefully can help people review the changes more
> > effectively.
>
>
I did a deeper review of the patch. Here are some comments

Approach
--------
The equijoin condition between partition keys doesn't appear in the join's
restrictilist because of 'best_score' strategy as you explained well in
[2]. What if we add an extra score for clauses between partition keys and
give preference to equijoin between partition keys? Have you given it a
thought? I feel that having an equijoin clause involving partition keys has
more usages compared to a clause with any random column. E.g. nextloop may
be able to prune partitions from inner relation if the clause contains a
partition key.

Partition pruning requires equality clauses on partition keys as well.
create_append_plan() fetches those from best_path->param_info. If we
created and saved the clauses involving partition keys somewhere
separately, similar to the clauses involving index keys, it might help this
case as well as the partition pruning code. Have you considered this idea?

There was a proposal to use ECs for outer joins as well and then use only
ECs to decide whether equijoins between partition keys exist. I don't think
the proposal has materialized. So we have to continue looking at
restrictlist as well. I don't see a point waiting for it, but others might
feel differently.

I am just trying to find ways to avoid two loops in
have_partkey_equi_join(). If the alternatives are worse, I think the
current approach is fine.

Documentation
-------------
The patch does not modify any documentation. The only documentation I could
find about partitionwise join is the one for GUC
'enable_partitionwise_join'. It says
--- quote
"Partitionwise join currently applies only when the join conditions include
all the partition keys, which must be of the same data type and have
one-to-one matching sets of child partitions.".
--- unquote
This sentence is general and IMO covers the case this patch considers. But
in general I feel that partitionwise join and aggregation deserve separate
sections next to "partition pruning" in [1]; It should mention advanced
partition matching algorithm as well. Would you be willing to write one and
then expand it for the case in the patch?

Tests
-----
The patch adds a testcase for single column partitioning. I think we need
to do better like
1. Test for partitioning on expression, multilevel partitioning, advanced
partition matching. Those all might just work. Having tests helps us to
notice any future breakage.
2. Some negative test case e.g. equijoin clauses with disjunction, with
inequality operator, equality operators with operators from different
families etc.
3. The testcase added looks artificial. it outputs data that has same value
for two columns which is equal to the primary key of the other table - when
would somebody do that?. Is there some real life example where this change
will be useful?

Code
----
Minor comment for now. It will be better to increment num_equal_pks
immediately after setting pk_known_equal[ipk] = true. Otherwise the code
gets confusing around line 2269. I will spend more time reviewing the code
next week.

[1] https://www.postgresql.org/docs/current/ddl-partitioning.html
[2]
https://www.postgresql.org/message-id/CAN_9JTxucGdVY9tV6Uxq0CdhrW98bZtxPKFbF_75qdPi5wBaow@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-03-07 11:16:05 Re: Properly pathify the union planner
Previous Message Tomas Vondra 2024-03-07 11:13:03 Re: remaining sql/json patches