Re: pull-up subquery if JOIN-ON contains refs to upper-query

From: Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>
To: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
Cc: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Petrov <p(dot)petrov(at)postgrespro(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: pull-up subquery if JOIN-ON contains refs to upper-query
Date: 2025-10-10 11:15:58
Message-ID: 0c498546-8af0-4aec-98fa-ad4f373cf711@yandex.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

On 03.09.2025 00:07, Ilia Evdokimov wrote:
>
> Hi,
>
> I've reviewed this patch, and I have suggestion about the approach.
>
> Currently, the patch extends 'convert_EXISTS_sublick_to_join' with
> quite complex logic (clause collection, volatile checks, rewriting
> join quals, etc). While it works, the amount of branching and special
> cases makes the function harder to follow.
>
> Looking at the logic, it seems that a large part of the complexity
> comes from trying to directly adapt 'convert_EXISTS_sublink_to_join'
> instead of factoring out a dedicated path. An alternative would be to
> introduce a separate function
> *'convert_EXISTS_sublink_to_lateral_join' *- with a similar API to
> 'convert_ANY_sublink_to_join'. Such a function can focus only on the
> EXISTS-to-join case, while keeping the existing function shorter and
> easier to reason about.
>
> I even made some first rough sketches of this approach (not a finished
> patch, just an outline). Of course, it would still need proper
> adaptation, but I think it demonstrates that the overall structure can
> be kept simpler.
>
> What do you think about refactoring in this direction?
>
>
I think this approach isn’t fully correct. By forming a join between a
subquery and the outer relation, you effectively force the optimizer to
choose a Nested Loop join. Furthermore, it prevents the planner from
exploring all join orders between the subquery’s tables and the outer
relation, so we may miss a more optimal plan.

With your patch, I consistently get the following plan. I even disabled
nested loops to see whether the planner could switch to a Hash Join or
Merge Join, but those aren’t applicable with lateral parameters in this
pattern.

CREATE TABLE ta (id int PRIMARY KEY, val int);
INSERT INTO ta VALUES (1,1), (2,2);

CREATE TABLE tb (id int PRIMARY KEY, aval int);
INSERT INTO tb VALUES (1,1), (2,1);

CREATE TABLE tc (id int PRIMARY KEY, aid int);
INSERT INTO tc VALUES (3,5), (1,5);

CREATE TABLE td (id int PRIMARY KEY, aid int);
INSERT INTO td VALUES (1,6), (2,7), (3,8), (4,9);

CREATE TABLE te (id int PRIMARY KEY, aid int);
INSERT INTO te VALUES (5,6), (6,7), (7,8), (4,9), (1,1);

SET enable_nestloop = OFF;

EXPLAIN ANALYZE
SELECT ta.id
FROM ta
WHERE EXISTS (
  SELECT 1
  FROM tb
  WHERE tb.id = ta.id
    AND EXISTS (SELECT 1 FROM tc WHERE tc.id = tb.id)
);

                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Semi Join  (cost=0.31..37017.50 rows=2260 width=4) (actual
time=0.116..0.142 rows=2.00 loops=1)
   Disabled: true
   Buffers: local hit=9
   ->  Seq Scan on ta  (cost=0.00..32.60 rows=2260 width=4) (actual
time=0.037..0.039 rows=2.00 loops=1)
         Buffers: local hit=1

   ->  Nested Loop Semi Join  (cost=0.31..16.36 rows=1 width=0) (actual
time=0.045..0.046 rows=1.00 loops=2)
         Disabled: true
         Buffers: local hit=8
         ->  Index Only Scan using tb_pkey on tb (cost=0.15..8.17
rows=1 width=4) (actual time=0.030..0.030 rows=1.00 loops=2)
               Index Cond: (id = ta.id)
               Heap Fetches: 2
               Index Searches: 2
               Buffers: local hit=4
         ->  Index Only Scan using tc_pkey on tc (cost=0.15..8.17
rows=1 width=4) (actual time=0.010..0.010 rows=1.00 loops=2)
               Index Cond: (id = ta.id)
               Heap Fetches: 2
               Index Searches: 2
               Buffers: local hit=4
 Planning Time: 0.539 ms
 Execution Time: 0.252 ms
(20 rows)

Anyway, thank you for the work and attention here - your feedback was
useful!

I’ve also rebased the patch on current master.

Attachment Content-Type Size
v6-0001-Enables-pull-up-of-EXISTS-subqueries-that-contain-IN.patch text/x-patch 43.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2025-10-10 12:23:07 Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions
Previous Message Holger Hoffstätte 2025-10-10 11:10:35 Re: [PATCH] jit: fix build with LLVM-21