Re: A problem about partitionwise join

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: A problem about partitionwise join
Date: 2024-03-25 03:30:55
Message-ID: CAMbWs48AOLfx+weJ6=2U1DuxBsWDnC3LBrQr9QXoDq6TBq+H6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 19, 2024 at 3:40 PM Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
wrote:

> On Tue, Mar 19, 2024 at 8:18 AM Richard Guo <guofenglinux(at)gmail(dot)com>
> wrote:
>
>> On Thu, Mar 7, 2024 at 7:13 PM Ashutosh Bapat <
>> ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>>
>>> Approach
>>> --------
>>> The equijoin condition between partition keys doesn't appear in the
>>> join's restrictilist because of 'best_score' strategy as you explained well
>>> in [2]. What if we add an extra score for clauses between partition keys
>>> and give preference to equijoin between partition keys? Have you given it a
>>> thought? I feel that having an equijoin clause involving partition keys has
>>> more usages compared to a clause with any random column. E.g. nextloop may
>>> be able to prune partitions from inner relation if the clause contains a
>>> partition key.
>>>
>>
>> Hmm, I think this approach won't work in cases where one certain pair of
>> partition keys has formed an EC that contains pseudoconstants. In such
>> cases, the EC machinery will generate restriction clauses like 'pk =
>> const' rather than any join clauses.
>>
>
> That should be ok and more desirable. Clauses like pk = const will leave
> only one partition around in each of the joining relations thus PWJ won't
> be required OR it will be automatic - whichever way you see it.
>

No, that's not true. There could be multiple partition keys, and the
particular key involved in the pushed-down restriction 'pk = const' may
not be able to prune away any partitions. To be concrete, consider the
query:

create table p (k1 int, k2 int, val int) partition by range(k1, k2);
create table p_1 partition of p for values from (1,1) to (10,100);
create table p_2 partition of p for values from (10,100) to (20,200);

set enable_partitionwise_join to on;

explain (costs off)
select * from p as foo join p as bar on foo.k1 = bar.k1 and foo.k2 = bar.k2
and foo.k2 = 5;
QUERY PLAN
-----------------------------------------
Hash Join
Hash Cond: (foo.k1 = bar.k1)
-> Append
-> Seq Scan on p_1 foo_1
Filter: (k2 = 5)
-> Seq Scan on p_2 foo_2
Filter: (k2 = 5)
-> Hash
-> Append
-> Seq Scan on p_1 bar_1
Filter: (k2 = 5)
-> Seq Scan on p_2 bar_2
Filter: (k2 = 5)
(13 rows)

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-03-25 04:18:23 Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Masahiko Sawada 2024-03-25 03:13:19 Re: [PoC] Improve dead tuple storage for lazy vacuum