Re: Using each rel as both outer and inner for JOIN_ANTI

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Emre Hasegeli <emre(at)hasegeli(dot)com>
Subject: Re: Using each rel as both outer and inner for JOIN_ANTI
Date: 2022-08-10 07:57:33
Message-ID: CAMbWs492b4h2-zHyFJ9EGu4bgmOUaupeUpz=7yubyip=mZjvTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 9, 2022 at 6:54 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
wrote:

> I suppose this looks good as far as the plan goes, but the cost estimation
> might be a little bit too optimistic: it is reporting that the new plan
> costs 50% of the original, yet the execution time is only 5% lower.

Thanks for trying this patch. Yeah, the estimated cost doesn't match the
execution time here. I tried the query locally and here is what I got
(best of three):

Unpatched:
# explain analyze select * from foo left join bar on foo.a = bar.c where
bar.c is null;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Hash Anti Join (cost=154156.00..173691.19 rows=1 width=16) (actual
time=1548.622..1548.624 rows=0 loops=1)
Hash Cond: (foo.a = bar.c)
-> Seq Scan on foo (cost=0.00..1.10 rows=10 width=8) (actual
time=0.024..0.026 rows=10 loops=1)
-> Hash (cost=72124.00..72124.00 rows=5000000 width=8) (actual
time=1443.157..1443.158 rows=5000000 loops=1)
Buckets: 262144 Batches: 64 Memory Usage: 5107kB
-> Seq Scan on bar (cost=0.00..72124.00 rows=5000000 width=8)
(actual time=0.045..481.059 rows=5000000 loops=1)
Planning Time: 0.262 ms
Execution Time: 1549.138 ms
(8 rows)

Patched:
# explain analyze select * from foo left join bar on foo.a = bar.c where
bar.c is null;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Hash Right Anti Join (cost=1.23..90875.33 rows=1 width=16) (actual
time=985.773..985.775 rows=0 loops=1)
Hash Cond: (bar.c = foo.a)
-> Seq Scan on bar (cost=0.00..72124.00 rows=5000000 width=8) (actual
time=0.095..438.333 rows=5000000 loops=1)
-> Hash (cost=1.10..1.10 rows=10 width=8) (actual time=0.076..0.077
rows=10 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on foo (cost=0.00..1.10 rows=10 width=8) (actual
time=0.060..0.064 rows=10 loops=1)
Planning Time: 0.290 ms
Execution Time: 985.830 ms
(8 rows)

Seems the cost matches the execution time better in my local box.

The right-anti join plan has the same cost estimation with right join
plan in this case. So would you please help to test what the right join
plan looks like in your env for the query below?

select * from foo left join bar on foo.a = bar.c;

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nazir Bilal Yavuz 2022-08-10 07:58:47 Re: [RFC] building postgres with meson - v10
Previous Message Drouvot, Bertrand 2022-08-10 07:52:02 Fix a typo in pgstatfuncs.c