Re: Using each rel as both outer and inner for JOIN_ANTI

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Emre Hasegeli <emre(at)hasegeli(dot)com>
Subject: Re: Using each rel as both outer and inner for JOIN_ANTI
Date: 2022-08-09 10:54:27
Message-ID: 20220809105427.ym3ehoop7dnjame6@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Just for kicks, I ran query in your original post under EXPLAIN ANALYZE
in both patched and unpatched with this last version. I got this (best
of three):

Unpatched:
55432 16devel 437532=# explain (analyze, buffers) select * from foo left join bar on foo.a = bar.c where bar.c is null;
QUERY PLAN
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Hash Anti Join (cost=159039.00..183457.23 rows=10 width=20) (actual time=482.788..483.182 rows=10 loops=1)
Hash Cond: (foo.a = bar.c)
Buffers: shared hit=161 read=21964, temp read=8 written=8
-> Seq Scan on foo (cost=0.00..1.10 rows=10 width=8) (actual time=0.020..0.022 rows=10 loops=1)
Buffers: shared hit=1
-> Hash (cost=72124.00..72124.00 rows=5000000 width=12) (actual time=482.128..482.129 rows=0 loops=1)
Buckets: 262144 Batches: 64 Memory Usage: 2048kB
Buffers: shared hit=160 read=21964
-> Seq Scan on bar (cost=0.00..72124.00 rows=5000000 width=12) (actual time=0.092..237.431 rows=5000000 loops=1)
Buffers: shared hit=160 read=21964
Planning Time: 0.182 ms
Execution Time: 483.248 ms

Patched:
QUERY PLAN
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Hash Right Anti Join (cost=1.23..90875.24 rows=10 width=20) (actual time=457.654..457.658 rows=10 loops=1)
Hash Cond: (bar.c = foo.a)
Buffers: shared hit=33 read=22092
-> Seq Scan on bar (cost=0.00..72124.00 rows=5000000 width=12) (actual time=0.020..229.097 rows=5000000 loops=1)
Buffers: shared hit=32 read=22092
-> Hash (cost=1.10..1.10 rows=10 width=8) (actual time=0.011..0.012 rows=10 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=1
-> Seq Scan on foo (cost=0.00..1.10 rows=10 width=8) (actual time=0.006..0.007 rows=10 loops=1)
Buffers: shared hit=1
Planning Time: 0.067 ms
Execution Time: 457.679 ms

I suppose this looks good as far as the plan goes, but the cost estimation
might be a little bit too optimistic: it is reporting that the new plan
costs 50% of the original, yet the execution time is only 5% lower.

I wonder where does time go (in unpatched) when seqscanning finishes
and before hashing starts.

(I had to disable JIT for the first one, as it insisted on JITting tuple
deforming.)

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Bob [Floyd] used to say that he was planning to get a Ph.D. by the "green
stamp method," namely by saving envelopes addressed to him as 'Dr. Floyd'.
After collecting 500 such letters, he mused, a university somewhere in
Arizona would probably grant him a degree. (Don Knuth)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message r.zharkov 2022-08-09 10:59:33 Re: Checking pgwin32_is_junction() errors
Previous Message Drouvot, Bertrand 2022-08-09 10:39:02 Re: [PATCH] Expose port->authn_id to extensions and triggers