From: | Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru> |
---|---|
To: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
Cc: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Petrov <p(dot)petrov(at)postgrespro(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com> |
Subject: | Re: pull-up subquery if JOIN-ON contains refs to upper-query |
Date: | 2025-10-10 11:15:58 |
Message-ID: | 0c498546-8af0-4aec-98fa-ad4f373cf711@yandex.ru |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi!
On 03.09.2025 00:07, Ilia Evdokimov wrote:
>
> Hi,
>
> I've reviewed this patch, and I have suggestion about the approach.
>
> Currently, the patch extends 'convert_EXISTS_sublick_to_join' with
> quite complex logic (clause collection, volatile checks, rewriting
> join quals, etc). While it works, the amount of branching and special
> cases makes the function harder to follow.
>
> Looking at the logic, it seems that a large part of the complexity
> comes from trying to directly adapt 'convert_EXISTS_sublink_to_join'
> instead of factoring out a dedicated path. An alternative would be to
> introduce a separate function
> *'convert_EXISTS_sublink_to_lateral_join' *- with a similar API to
> 'convert_ANY_sublink_to_join'. Such a function can focus only on the
> EXISTS-to-join case, while keeping the existing function shorter and
> easier to reason about.
>
> I even made some first rough sketches of this approach (not a finished
> patch, just an outline). Of course, it would still need proper
> adaptation, but I think it demonstrates that the overall structure can
> be kept simpler.
>
> What do you think about refactoring in this direction?
>
>
I think this approach isn’t fully correct. By forming a join between a
subquery and the outer relation, you effectively force the optimizer to
choose a Nested Loop join. Furthermore, it prevents the planner from
exploring all join orders between the subquery’s tables and the outer
relation, so we may miss a more optimal plan.
With your patch, I consistently get the following plan. I even disabled
nested loops to see whether the planner could switch to a Hash Join or
Merge Join, but those aren’t applicable with lateral parameters in this
pattern.
CREATE TABLE ta (id int PRIMARY KEY, val int);
INSERT INTO ta VALUES (1,1), (2,2);
CREATE TABLE tb (id int PRIMARY KEY, aval int);
INSERT INTO tb VALUES (1,1), (2,1);
CREATE TABLE tc (id int PRIMARY KEY, aid int);
INSERT INTO tc VALUES (3,5), (1,5);
CREATE TABLE td (id int PRIMARY KEY, aid int);
INSERT INTO td VALUES (1,6), (2,7), (3,8), (4,9);
CREATE TABLE te (id int PRIMARY KEY, aid int);
INSERT INTO te VALUES (5,6), (6,7), (7,8), (4,9), (1,1);
SET enable_nestloop = OFF;
EXPLAIN ANALYZE
SELECT ta.id
FROM ta
WHERE EXISTS (
SELECT 1
FROM tb
WHERE tb.id = ta.id
AND EXISTS (SELECT 1 FROM tc WHERE tc.id = tb.id)
);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Nested Loop Semi Join (cost=0.31..37017.50 rows=2260 width=4) (actual
time=0.116..0.142 rows=2.00 loops=1)
Disabled: true
Buffers: local hit=9
-> Seq Scan on ta (cost=0.00..32.60 rows=2260 width=4) (actual
time=0.037..0.039 rows=2.00 loops=1)
Buffers: local hit=1
-> Nested Loop Semi Join (cost=0.31..16.36 rows=1 width=0) (actual
time=0.045..0.046 rows=1.00 loops=2)
Disabled: true
Buffers: local hit=8
-> Index Only Scan using tb_pkey on tb (cost=0.15..8.17
rows=1 width=4) (actual time=0.030..0.030 rows=1.00 loops=2)
Index Cond: (id = ta.id)
Heap Fetches: 2
Index Searches: 2
Buffers: local hit=4
-> Index Only Scan using tc_pkey on tc (cost=0.15..8.17
rows=1 width=4) (actual time=0.010..0.010 rows=1.00 loops=2)
Index Cond: (id = ta.id)
Heap Fetches: 2
Index Searches: 2
Buffers: local hit=4
Planning Time: 0.539 ms
Execution Time: 0.252 ms
(20 rows)
Anyway, thank you for the work and attention here - your feedback was
useful!
I’ve also rebased the patch on current master.
Attachment | Content-Type | Size |
---|---|---|
v6-0001-Enables-pull-up-of-EXISTS-subqueries-that-contain-IN.patch | text/x-patch | 43.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | jian he | 2025-10-10 12:23:07 | Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions |
Previous Message | Holger Hoffstätte | 2025-10-10 11:10:35 | Re: [PATCH] jit: fix build with LLVM-21 |