Quick Links

Re: pull-up subquery if JOIN-ON contains refs to upper-query

From:	Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To:	Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Petrov <p(dot)petrov(at)postgrespro(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Subject:	Re: pull-up subquery if JOIN-ON contains refs to upper-query
Date:	2025-06-04 10:40:07
Message-ID:	87a6fcbf-929a-408a-9471-28be3080254a@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi, all! I updated the patch and it looks nice. All the problems have
been solved.

On 03.04.2025 16:56, Ilia Evdokimov wrote:
>
> On 02.04.2025 19:39, Alena Rybakina wrote:
>>
>> I see that I need to add a walker that, when traversing the tree,
>> determines whether there are conditions under which pull-up is
>> impossible - the presence of
>> volatility of functions and other restrictions, and leave the
>> transformation for the var objects that I added before, I described
>> it here.
>>
>
> I have some concerns about pulling up every clause from the subquery
> with one column. In particular, not every clause is safe or beneficial
> to pull up: OR-clauses, CASE expressions, nested sublinks could
> significantly change how the planner estimates the number of rows or
> applies filters, especially when they are not true join predicates.
> Pulling them up might lead to worse plans, or even change the
> semantics in subtle ways. I think before applying such
> transformations, we should make sure they are not only safe but
> actually improve the resulting plan.

There may indeed be cases where a query plan without pull-up is worse
than with pull-up.

For example, as shown below, with pull-up we don't need to scan two
tables and perform a join, since the subquery returns 0 rows (no
matching tuples in the inner sequential scan in a parameterized Nested
Loop).
However, this cannot be detected at the current planning stage - we
simply don't have that information yet.

Do you have any ideas on how to solve this problem? So far, the only
approach I see is to try an alternative plan but I'm still learning this.

For example:

create table t(x int);
create table t1(x int);
create table t2(x int);

insert into t2 select id from generate_series(20001,30000) as id;
insert into t1 select id from generate_series(10001,20000) as id;
insert into t select id from generate_series(1,10000) as id;
vacuum analyze;
explain analyze select * from t where exists (select * from t1 join t2
on t.x = t1.x);

with my patch:
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=1500540.00..1500822.50 rows=10000 width=4) (actual
time=70694.658..70694.662 rows=0.00 loops=1)
   Hash Cond: (t.x = t1.x)
   Buffers: shared hit=135
*->* *Seq Scan on t* (cost=0.00..145.00 rows=10000 width=4) (actual
time=0.009..1.545 *rows=10000.00* loops=1)
         Buffers: shared hit=45
   -> Hash (cost=1500415.00..1500415.00 rows=10000 width=4) (actual
time=70690.524..70690.526 rows=10000.00 loops=1)
         Buckets: 16384 Batches: 1 Memory Usage: 480kB
         Buffers: shared hit=90
         -> HashAggregate (cost=1500315.00..1500415.00 rows=10000
width=4) (actual time=70683.143..70686.590 rows=10000.00 loops=1)
               Group Key: t1.x
               Batches: 1 Memory Usage: 793kB
               Buffers: shared hit=90
*->* *Nested Loop* (cost=0.00..1250315.00 rows=100000000 width=4)
(actual time=0.019..25650.447 *rows=100000000.00* loops=1)
                     Buffers: shared hit=90
*-> Seq Scan on t1* (cost=0.00..145.00 rows=10000 width=4) (actual
time=0.006..4.931 *rows=10000.00* loops=1)
                           Buffers: shared hit=45
                     -> Materialize (cost=0.00..195.00 rows=10000
width=0) (actual time=0.000..0.875 rows=10000.00 loops=10000)
                           Storage: Memory Maximum Storage: 519kB
                           Buffers: shared hit=45
                           -> Seq Scan on t2 (cost=0.00..145.00
rows=10000 width=0) (actual time=0.007..1.246 rows=10000.00 loops=1)
                                 Buffers: shared hit=45
Planning:
   Buffers: shared hit=36 read=3
Planning Time: 0.375 ms
*Execution Time: 70695.154 ms*

without my patch:

                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
*Seq Scan on t* (cost=0.00..309.30 rows=5738 width=4) (actual
time=68268.562..68268.565 *rows=0.00* loops=1)
   Filter: EXISTS(SubPlan 1)
   Rows Removed by Filter: 10000
   Buffers: shared hit=900045
   SubPlan 1
     -> *Nested Loop* (cost=0.00..8524.27 rows=654075 width=0) (actual
time=6.823..6.823 *rows=0.00* loops=10000)
           Buffers: shared hit=900000
           -> Seq Scan on t2 (cost=0.00..159.75 rows=11475 width=0)
(actual time=0.011..1.660 rows=10000.00 loops=10000)
                 Buffers: shared hit=450000
           -> Materialize (cost=0.00..188.72 rows=57 width=0) (actual
time=0.000..0.000 rows=0.00 loops=100000000)
                 Storage: Memory Maximum Storage: 17kB
                 Buffers: shared hit=450000
                 -> Seq Scan on t1 (cost=0.00..188.44 rows=57
width=0) (actual time=2.403..2.403 *rows=0.00* loops=10000)
                       Filter: (t.x = x)
                       Rows Removed by Filter: 10000
                       Buffers: shared hit=450000
Planning:
   Buffers: shared hit=40 read=16
Planning Time: 0.487 ms
Execution Time: *68268.600 ms*

--
Regards,
Alena Rybakina
Postgres Professional

Attachment	Content-Type	Size
v4-0001-Teach-the-planner-to-convert-EXISTS-and-NOT-EXISTS-s.patch	text/x-patch	30.7 KB

In response to

Re: pull-up subquery if JOIN-ON contains refs to upper-query at 2025-04-03 13:56:52 from Ilia Evdokimov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Ajin Cherian	2025-06-04 10:41:59	Re: Restrict publishing of partitioned table with a foreign table as partition
Previous Message	Thomas Munro	2025-06-04 10:39:41	Re: Custom Glibc collation version strings under LOCPATH