From: | Álvaro Herrera <alvherre(at)kurilemu(dot)de> |
---|---|
To: | Richard Guo <guofenglinux(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andy Fan <zhihuifan1213(at)163(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com> |
Subject: | Re: Pathify RHS unique-ification for semijoin planning |
Date: | 2025-07-23 08:11:03 |
Message-ID: | 202507230811.ueghjzzloxkm@alvherre.pgsql |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
As a very trivial test on this patch, I ran the query in your opening
email, both with and without the patch, scaling up the size of the table
a little bit. So I did this
drop table if exists t;
create table t(a int, b int);
insert into t select i % 100000, i from generate_series(1,1e7) i;
create index on t(a);
vacuum analyze t;
set enable_hashagg to off;
explain (costs off, analyze, buffers)
select * from t t1 where t1.a in
(select a from t t2 where a < 10000)
order by t1.a;
This is the plan without the patch:
QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Merge Join (actual time=289.262..700.761 rows=1000000.00 loops=1)
Merge Cond: (t1.a = t2.a)
Buffers: shared hit=1017728 read=3945 written=3361, temp read=1471 written=1476
-> Index Scan using t_a_idx on t t1 (actual time=0.011..320.747 rows=1000001.00 loops=1)
Index Searches: 1
Buffers: shared hit=997725 read=3112 written=2664
-> Sort (actual time=219.273..219.771 rows=10000.00 loops=1)
Sort Key: t2.a
Sort Method: quicksort Memory: 385kB
Buffers: shared hit=20003 read=833 written=697, temp read=1471 written=1476
-> Unique (actual time=128.173..218.708 rows=10000.00 loops=1)
Buffers: shared hit=20003 read=833 written=697, temp read=1471 written=1476
-> Sort (actual time=128.170..185.461 rows=1000000.00 loops=1)
Sort Key: t2.a
Sort Method: external merge Disk: 11768kB
Buffers: shared hit=20003 read=833 written=697, temp read=1471 written=1476
-> Index Only Scan using t_a_idx on t t2 (actual time=0.024..74.171 rows=1000000.00 loops=1)
Index Cond: (a < 10000)
Heap Fetches: 0
Index Searches: 1
Buffers: shared hit=20003 read=833 written=697
Planning:
Buffers: shared hit=28 read=7
Planning Time: 0.212 ms
Execution Time: 732.840 ms
and this is the plan with the patch:
QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────────────────────────
Merge Join (actual time=70.310..595.116 rows=1000000.00 loops=1)
Merge Cond: (t1.a = t2.a)
Buffers: shared hit=1017750 read=3923 written=3586
-> Index Scan using t_a_idx on t t1 (actual time=0.020..341.257 rows=1000001.00 loops=1)
Index Searches: 1
Buffers: shared hit=996914 read=3923 written=3586
-> Unique (actual time=0.028..99.074 rows=10000.00 loops=1)
Buffers: shared hit=20836
-> Index Only Scan using t_a_idx on t t2 (actual time=0.026..66.219 rows=1000000.00 loops=1)
Index Cond: (a < 10000)
Heap Fetches: 0
Index Searches: 1
Buffers: shared hit=20836
Planning:
Buffers: shared hit=55 read=15 written=14
Planning Time: 0.391 ms
Execution Time: 621.377 ms
This is a really nice improvement. I think we could find queries that
are arbitrarily faster, by feeding enough tuples to the unnecessary Sort
nodes.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"No necesitamos banderas
No reconocemos fronteras" (Jorge González)
From | Date | Subject | |
---|---|---|---|
Next Message | Álvaro Herrera | 2025-07-23 08:12:39 | Re: Proposal: QUALIFY clause |
Previous Message | Michael Paquier | 2025-07-23 08:09:54 | Re: Custom pgstat support performance regression for simple queries |