From: | Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> |
---|---|
To: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Petrov <p(dot)petrov(at)postgrespro(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
Subject: | Re: pull-up subquery if JOIN-ON contains refs to upper-query |
Date: | 2025-06-04 10:40:07 |
Message-ID: | 87a6fcbf-929a-408a-9471-28be3080254a@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi, all! I updated the patch and it looks nice. All the problems have
been solved.
On 03.04.2025 16:56, Ilia Evdokimov wrote:
>
> On 02.04.2025 19:39, Alena Rybakina wrote:
>>
>> I see that I need to add a walker that, when traversing the tree,
>> determines whether there are conditions under which pull-up is
>> impossible - the presence of
>> volatility of functions and other restrictions, and leave the
>> transformation for the var objects that I added before, I described
>> it here.
>>
>
> I have some concerns about pulling up every clause from the subquery
> with one column. In particular, not every clause is safe or beneficial
> to pull up: OR-clauses, CASE expressions, nested sublinks could
> significantly change how the planner estimates the number of rows or
> applies filters, especially when they are not true join predicates.
> Pulling them up might lead to worse plans, or even change the
> semantics in subtle ways. I think before applying such
> transformations, we should make sure they are not only safe but
> actually improve the resulting plan.
There may indeed be cases where a query plan without pull-up is worse
than with pull-up.
For example, as shown below, with pull-up we don't need to scan two
tables and perform a join, since the subquery returns 0 rows (no
matching tuples in the inner sequential scan in a parameterized Nested
Loop).
However, this cannot be detected at the current planning stage - we
simply don't have that information yet.
Do you have any ideas on how to solve this problem? So far, the only
approach I see is to try an alternative plan but I'm still learning this.
For example:
create table t(x int);
create table t1(x int);
create table t2(x int);
insert into t2 select id from generate_series(20001,30000) as id;
insert into t1 select id from generate_series(10001,20000) as id;
insert into t select id from generate_series(1,10000) as id;
vacuum analyze;
explain analyze select * from t where exists (select * from t1 join t2
on t.x = t1.x);
with my patch:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=1500540.00..1500822.50 rows=10000 width=4) (actual
time=70694.658..70694.662 rows=0.00 loops=1)
Hash Cond: (t.x = t1.x)
Buffers: shared hit=135
*->* *Seq Scan on t* (cost=0.00..145.00 rows=10000 width=4) (actual
time=0.009..1.545 *rows=10000.00* loops=1)
Buffers: shared hit=45
-> Hash (cost=1500415.00..1500415.00 rows=10000 width=4) (actual
time=70690.524..70690.526 rows=10000.00 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 480kB
Buffers: shared hit=90
-> HashAggregate (cost=1500315.00..1500415.00 rows=10000
width=4) (actual time=70683.143..70686.590 rows=10000.00 loops=1)
Group Key: t1.x
Batches: 1 Memory Usage: 793kB
Buffers: shared hit=90
*->* *Nested Loop* (cost=0.00..1250315.00 rows=100000000 width=4)
(actual time=0.019..25650.447 *rows=100000000.00* loops=1)
Buffers: shared hit=90
*-> Seq Scan on t1* (cost=0.00..145.00 rows=10000 width=4) (actual
time=0.006..4.931 *rows=10000.00* loops=1)
Buffers: shared hit=45
-> Materialize (cost=0.00..195.00 rows=10000
width=0) (actual time=0.000..0.875 rows=10000.00 loops=10000)
Storage: Memory Maximum Storage: 519kB
Buffers: shared hit=45
-> Seq Scan on t2 (cost=0.00..145.00
rows=10000 width=0) (actual time=0.007..1.246 rows=10000.00 loops=1)
Buffers: shared hit=45
Planning:
Buffers: shared hit=36 read=3
Planning Time: 0.375 ms
*Execution Time: 70695.154 ms*
without my patch:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
*Seq Scan on t* (cost=0.00..309.30 rows=5738 width=4) (actual
time=68268.562..68268.565 *rows=0.00* loops=1)
Filter: EXISTS(SubPlan 1)
Rows Removed by Filter: 10000
Buffers: shared hit=900045
SubPlan 1
-> *Nested Loop* (cost=0.00..8524.27 rows=654075 width=0) (actual
time=6.823..6.823 *rows=0.00* loops=10000)
Buffers: shared hit=900000
-> Seq Scan on t2 (cost=0.00..159.75 rows=11475 width=0)
(actual time=0.011..1.660 rows=10000.00 loops=10000)
Buffers: shared hit=450000
-> Materialize (cost=0.00..188.72 rows=57 width=0) (actual
time=0.000..0.000 rows=0.00 loops=100000000)
Storage: Memory Maximum Storage: 17kB
Buffers: shared hit=450000
-> Seq Scan on t1 (cost=0.00..188.44 rows=57
width=0) (actual time=2.403..2.403 *rows=0.00* loops=10000)
Filter: (t.x = x)
Rows Removed by Filter: 10000
Buffers: shared hit=450000
Planning:
Buffers: shared hit=40 read=16
Planning Time: 0.487 ms
Execution Time: *68268.600 ms*
--
Regards,
Alena Rybakina
Postgres Professional
Attachment | Content-Type | Size |
---|---|---|
v4-0001-Teach-the-planner-to-convert-EXISTS-and-NOT-EXISTS-s.patch | text/x-patch | 30.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Ajin Cherian | 2025-06-04 10:41:59 | Re: Restrict publishing of partitioned table with a foreign table as partition |
Previous Message | Thomas Munro | 2025-06-04 10:39:41 | Re: Custom Glibc collation version strings under LOCPATH |