From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | Álvaro Herrera <alvherre(at)kurilemu(dot)de> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andy Fan <zhihuifan1213(at)163(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com> |
Subject: | Re: Pathify RHS unique-ification for semijoin planning |
Date: | 2025-09-02 10:10:23 |
Message-ID: | CAMbWs49xs52ka8o_xxy1Ub_+m7bw9=uyf8GS8SMhxdkHauQ92A@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jul 23, 2025 at 5:11 PM Álvaro Herrera <alvherre(at)kurilemu(dot)de> wrote:
> As a very trivial test on this patch, I ran the query in your opening
> email, both with and without the patch, scaling up the size of the table
> a little bit.
> This is a really nice improvement. I think we could find queries that
> are arbitrarily faster, by feeding enough tuples to the unnecessary Sort
> nodes.
FWIW, I'm looking for a query to better showcase the performance
improvement from this patch. Here is one I found.
create table t (a int, b int);
insert into t select i%10, i%10 from generate_series(1,50000) i;
create index on t (a, b);
analyze t;
explain (analyze, costs on)
select * from t t1, t t2 where (t1.a, t2.b) in (select a, b from t t3)
order by t1.a, t2.b;
Here are the planning and execution time on my snail-paced machine
(best of 3), without and with this patch.
-- without this patch
Planning Time: 0.850 ms
Execution Time: 108149.907 ms
-- with this patch
Planning Time: 0.728 ms
Execution Time: 29229.748 ms
So this specific case runs about 3.7 times faster, which is really
nice.
- Richard
From | Date | Subject | |
---|---|---|---|
Next Message | Mihail Nikalayeu | 2025-09-02 10:44:27 | Re: Adding REPACK [concurrently] |
Previous Message | shveta malik | 2025-09-02 10:00:21 | Re: Conflict detection for update_deleted in logical replication |