From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | empty table blocks hash antijoin |
Date: | 2020-10-30 04:19:01 |
Message-ID: | CAFj8pRAwJxEQ55iYscqoDWp0hsKj0SqX=dMyXQUr2o3rVgV5UQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi
I am checking one customer query where there are some empty tables in a
nontrivial query. The fixed estimation on empty tables in Postgres are
working very well, but I found another issue.
create table test_a(id int);
create table test_b(id int);
insert into test_a select generate_series(1,100000);
analyze test_a, test_b;
with zero row in test_b postgres optimizer uses nested loop
postgres=# explain analyze select * from test_a where not exists(select *
from test_b where test_a.id=test_b.id);
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN
│
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ Nested Loop Anti Join (cost=0.00..2693.00 rows=99999 width=4) (actual
time=0.024..90.530 rows=100000 loops=1) │
│ Join Filter: (test_a.id = test_b.id)
│
│ -> Seq Scan on test_a (cost=0.00..1443.00 rows=100000 width=4)
(actual time=0.017..8.171 rows=100000 loops=1) │
│ -> Seq Scan on test_b (cost=0.00..0.00 rows=1 width=4) (actual
time=0.000..0.000 rows=0 loops=100000) │
│ Planning Time: 0.153 ms
│
│ Execution Time: 94.331 ms
│
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(6 rows)
but if I add one fake row to test_b, I got hash antijoin
insert into test_b values(-1);
analyze test_b;
postgres=# explain analyze select * from test_a where not exists(select *
from test_b where test_a.id=test_b.id);
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN
│
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ Hash Anti Join (cost=1.02..2706.51 rows=99999 width=4) (actual
time=0.026..24.474 rows=100000 loops=1) │
│ Hash Cond: (test_a.id = test_b.id)
│
│ -> Seq Scan on test_a (cost=0.00..1443.00 rows=100000 width=4)
(actual time=0.010..8.522 rows=100000 loops=1) │
│ -> Hash (cost=1.01..1.01 rows=1 width=4) (actual time=0.008..0.010
rows=1 loops=1) │
│ Buckets: 1024 Batches: 1 Memory Usage: 9kB
│
│ -> Seq Scan on test_b (cost=0.00..1.01 rows=1 width=4) (actual
time=0.003..0.004 rows=1 loops=1) │
│ Planning Time: 0.186 ms
│
│ Execution Time: 28.334 ms
│
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(8 rows)
Now the query is almost 3 times faster. Probably this is a cost issue,
because cost is very similar. With fake row I got better plan. But when I
disable hashjoin I got more expensive but better plan too
postgres=# explain analyze select * from test_a where not exists(select *
from test_b where test_a.id=test_b.id);
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN
│
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ Nested Loop Anti Join (cost=0.00..2944.01 rows=99999 width=4) (actual
time=0.100..47.360 rows=100000 loops=1) │
│ Join Filter: (test_a.id = test_b.id)
│
│ Rows Removed by Join Filter: 100000
│
│ -> Seq Scan on test_a (cost=0.00..1443.00 rows=100000 width=4)
(actual time=0.019..8.586 rows=100000 loops=1) │
│ -> Materialize (cost=0.00..1.01 rows=1 width=4) (actual
time=0.000..0.000 rows=1 loops=100000) │
│ -> Seq Scan on test_b (cost=0.00..1.01 rows=1 width=4) (actual
time=0.006..0.008 rows=1 loops=1) │
│ Planning Time: 0.176 ms
│
│ Execution Time: 51.248 ms
│
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(8 rows)
On empty table the Materialize node helps 50%
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2020-10-30 04:32:03 | Re: Disable WAL logging to speed up data loading |
Previous Message | Tatsuo Ishii | 2020-10-30 04:17:08 | Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8 |