pg 9.3 exists subselect with limit brakes query plan

From: Kószó József <k(dot)joe(at)freemail(dot)hu>
To: pgsql-bugs(at)postgresql(dot)org
Subject: pg 9.3 exists subselect with limit brakes query plan
Date: 2014-03-18 00:36:57
Message-ID: 53279529.2070902@freemail.hu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello,

We plan to upgrade from PostgreSQL 8.3 to 9.3 and experience some
planner-related problems in our applications with subselects using
EXISTS and LIMIT keywords. We have two similar environment with these
postgres versions, in 8.3 does not matter, if the subselect using LIMIT
or not, but in 9.3 the LIMIT keyword causes odd behavior, enforces seq
scans. In 8.3 always get similar plans to both case, but in 9.3 the
LIMIT keyword slows down the query. Additionally without index to
subselect id field the plan and running time worse with one magnitude in
9.3 than 8.3.

The two test case:

SELECT ... WHERE EXISTS (SELECT ...);

SELECT ... WHERE EXISTS (SELECT ... LIMIT 1);

We try to make some application-independent, reproducible tests to show
the difference between versions:

===== Initializations =====

# createdb limit1test -U postgres

# pgbench -i -s 10 limit1test -U postgres

limit1test=# select count(*) from pgbench_branches;
count
-------
10

limit1test=# select count(*) from pgbench_accounts;
count
---------
1000000

limit1test=# select B.bid, count(*) from pgbench_branches B join
pgbench_accounts using (bid) group by 1 order by 1;
bid | count
-----+--------
1 | 100000
2 | 100000
3 | 100000
4 | 100000
5 | 100000
6 | 100000
7 | 100000
8 | 100000
9 | 100000
10 | 100000

===== PostgreSQL 9.3 =====

# explain analyze select bid from pgbench_branches B where exists
(select bid from pgbench_accounts A where A.bid=B.bid);
---------------------------------------------------------------------
Nested Loop Semi Join (cost=0.00..26402.35 rows=10 width=4) (actual
time=0.012..4972.868 rows=10 loops=1)
Join Filter: (b.bid = a.bid)
Rows Removed by Join Filter: 4500000
-> Seq Scan on pgbench_branches b (cost=0.00..1.10 rows=10
width=4) (actual time=0.003..0.013 rows=10 loops=1)
-> Seq Scan on pgbench_accounts a (cost=0.00..26394.00
rows=1000000 width=4) (actual time=0.002..242.321 rows=450001 loops=10)
Total runtime: 4972.904 ms

limit1test=# explain analyze select bid from pgbench_branches B where
exists (select bid from pgbench_accounts A where A.bid=B.bid limit 1);
---------------------------------------------------------------------
Seq Scan on pgbench_branches b (cost=0.00..3.99 rows=5 width=4)
(actual time=0.018..382.774 rows=10 loops=1)
Filter: (SubPlan 1)
SubPlan 1
-> Limit (cost=0.00..0.29 rows=1 width=4) (actual
time=38.269..38.269 rows=1 loops=10)
-> Seq Scan on pgbench_accounts a (cost=0.00..28894.00
rows=100000 width=4) (actual time=38.261..38.261 rows=1 loops=10)
Filter: (bid = b.bid)
Rows Removed by Filter: 450000
Total runtime: 382.816 ms

# create index pgbench_accounts_bid on pgbench_accounts(bid);

# explain analyze select bid from pgbench_branches B where exists
(select bid from pgbench_accounts A where A.bid=B.bid);
--------------------------------------------------------------------
Nested Loop Semi Join (cost=0.42..2739.24 rows=10 width=4) (actual
time=0.030..0.131 rows=10 loops=1)
-> Seq Scan on pgbench_branches b (cost=0.00..1.10 rows=10
width=4) (actual time=0.003..0.011 rows=10 loops=1)
-> Index Only Scan using pgbench_accounts_bid on pgbench_accounts a
(cost=0.42..2733.62 rows=100000 width=4) (actual time=0.010..0.010
rows=1 loops=10)
Index Cond: (bid = b.bid)
Heap Fetches: 0
Total runtime: 0.169 ms

# explain analyze select bid from pgbench_branches B where exists
(select bid from pgbench_accounts A where A.bid=B.bid limit 1);
--------------------------------------------------------------------
Seq Scan on pgbench_branches b (cost=0.00..3.99 rows=5 width=4)
(actual time=0.016..384.974 rows=10 loops=1)
Filter: (SubPlan 1)
SubPlan 1
-> Limit (cost=0.00..0.29 rows=1 width=4) (actual
time=38.490..38.490 rows=1 loops=10)
-> Seq Scan on pgbench_accounts a (cost=0.00..28894.00
rows=100000 width=4) (actual time=38.482..38.482 rows=1 loops=10)
Filter: (bid = b.bid)
Rows Removed by Filter: 450000
Total runtime: 385.015 ms

# explain analyze select bid from pgbench_branches B where exists
(select bid from pgbench_accounts A where A.bid=B.bid*100);
--------------------------------------------------------------------
Nested Loop Semi Join (cost=0.43..2989.31 rows=10 width=4) (actual
time=0.055..0.055 rows=0 loops=1)
-> Seq Scan on pgbench_branches b (cost=0.00..1.10 rows=10
width=4) (actual time=0.004..0.009 rows=10 loops=1)
-> Index Only Scan using pgbench_accounts_bid on pgbench_accounts a
(cost=0.43..2983.63 rows=100000 width=4) (actual time=0.002..0.002
rows=0 loops=10)
Index Cond: (bid = (b.bid * 100))
Heap Fetches: 0
Total runtime: 0.087 ms

# explain analyze select bid from pgbench_branches B where exists
(select bid from pgbench_accounts A where A.bid=B.bid*100 limit 1);
-------------------------------------------------------------------
Seq Scan on pgbench_branches b (cost=0.00..4.24 rows=5 width=4)
(actual time=1019.139..1019.139 rows=0 loops=1)
Filter: (SubPlan 1)
Rows Removed by Filter: 10
SubPlan 1
-> Limit (cost=0.00..0.31 rows=1 width=4) (actual
time=101.907..101.907 rows=0 loops=10)
-> Seq Scan on pgbench_accounts a (cost=0.00..31394.00
rows=100000 width=4) (actual time=101.898..101.898 rows=0 loops=10)
Filter: (bid = (b.bid * 100))
Rows Removed by Filter: 1000000
Total runtime: 1019.166 ms

# explain analyze select bid from pgbench_branches B where exists
(select bid from pgbench_accounts A where A.bid=B.bid offset 1);
------------------------------------------------------------------
Seq Scan on pgbench_branches b (cost=0.00..6.06 rows=5 width=4)
(actual time=0.042..0.166 rows=10 loops=1)
Filter: (SubPlan 1)
SubPlan 1
-> Limit (cost=0.46..3542.39 rows=99997 width=4) (actual
time=0.014..0.014 rows=1 loops=10)
-> Index Only Scan using pgbench_accounts_bid on
pgbench_accounts a (cost=0.42..3542.39 rows=99998 width=4) (actual
time=0.011..0.012 rows=2 loops=10)
Index Cond: (bid = b.bid)
Heap Fetches: 0
Total runtime: 0.200 ms

===== PostgreSQL 8.3 =====

# explain analyze select bid from branches B where exists (select bid
from accounts A where A.bid=B.bid);
---------------------------------------------------------------------
Seq Scan on branches b (cost=0.00..3.99 rows=5 width=4) (actual
time=0.020..436.306 rows=10 loops=1)
Filter: (subplan)
SubPlan
-> Seq Scan on accounts a (cost=0.00..28894.00 rows=100000
width=4) (actual time=43.618..43.618 rows=1 loops=10)
Filter: (bid = $0)
Total runtime: 436.348 ms

# explain analyze select bid from branches B where exists (select bid
from accounts A where A.bid=B.bid limit 1);
----------------------------------------------------------------------
Seq Scan on branches b (cost=0.00..3.99 rows=5 width=4) (actual
time=0.019..442.392 rows=10 loops=1)
Filter: (subplan)
SubPlan
-> Limit (cost=0.00..0.29 rows=1 width=4) (actual
time=44.231..44.231 rows=1 loops=10)
-> Seq Scan on accounts a (cost=0.00..28894.00 rows=100000
width=4) (actual time=44.223..44.223 rows=1 loops=10)
Filter: (bid = $0)
Total runtime: 442.440 ms

# create index accounts_bid on accounts(bid);

# explain analyze select bid from branches B where exists (select bid
from accounts A where A.bid=B.bid);
----------------------------------------------------------------------
Seq Scan on branches b (cost=0.00..1.55 rows=5 width=4) (actual
time=0.049..0.211 rows=10 loops=1)
Filter: (subplan)
SubPlan
-> Index Scan using accounts_bid on accounts a
(cost=0.00..4493.36 rows=100000 width=4) (actual time=0.017..0.017
rows=1 loops=10)
Index Cond: (bid = $0)
Total runtime: 0.258 ms

# explain analyze select bid from branches B where exists (select bid
from accounts A where A.bid=B.bid limit 1);
---------------------------------------------------------------------
Seq Scan on branches b (cost=0.00..1.55 rows=5 width=4) (actual
time=0.059..0.180 rows=10 loops=1)
Filter: (subplan)
SubPlan
-> Limit (cost=0.00..0.04 rows=1 width=4) (actual
time=0.015..0.015 rows=1 loops=10)
-> Index Scan using accounts_bid on accounts a
(cost=0.00..4493.36 rows=100000 width=4) (actual time=0.014..0.014
rows=1 loops=10)
Index Cond: (bid = $0)
Total runtime: 0.211 ms

# explain analyze select bid from branches B where exists (select bid
from accounts A where A.bid=B.bid*100);
---------------------------------------------------------------------
Seq Scan on branches b (cost=0.00..1.57 rows=5 width=4) (actual
time=0.050..0.050 rows=0 loops=1)
Filter: (subplan)
SubPlan
-> Index Scan using accounts_bid on accounts a
(cost=0.00..4493.36 rows=100000 width=4) (actual time=0.003..0.003
rows=0 loops=10)
Index Cond: (bid = ($0 * 100))
Total runtime: 0.082 ms

# explain analyze select bid from branches B where exists (select bid
from accounts A where A.bid=B.bid*100 limit 1);
---------------------------------------------------------------------
Seq Scan on branches b (cost=0.00..1.57 rows=5 width=4) (actual
time=0.066..0.066 rows=0 loops=1)
Filter: (subplan)
SubPlan
-> Limit (cost=0.00..0.05 rows=1 width=4) (actual
time=0.005..0.005 rows=0 loops=10)
-> Index Scan using accounts_bid on accounts a
(cost=0.00..4493.36 rows=100000 width=4) (actual time=0.003..0.003
rows=0 loops=10)
Index Cond: (bid = ($0 * 100))
Total runtime: 0.095 ms

# explain analyze select bid from branches B where exists (select bid
from accounts A where A.bid=B.bid offset 1);
----------------------------------------------------------------------
Seq Scan on branches b (cost=0.00..2.00 rows=5 width=4) (actual
time=0.038..0.181 rows=10 loops=1)
Filter: (subplan)
SubPlan
-> Limit (cost=0.04..4493.36 rows=99999 width=4) (actual
time=0.016..0.016 rows=1 loops=10)
-> Index Scan using accounts_bid on accounts a
(cost=0.00..4493.36 rows=100000 width=4) (actual time=0.013..0.014
rows=2 loops=10)
Index Cond: (bid = $0)
Total runtime: 0.213 ms

===== Postgres settings =====

PostgreSQL 9.3:

version
----------------------------------------------------------------------
PostgreSQL 9.3.3 on x86_64-unknown-linux-gnu, compiled by gcc (Debian
4.7.2-5) 4.7.2, 64-bit

work_mem
----------
128MB

shared_buffers
----------------
1536MB

effective_cache_size
----------------------
3GB

default_statistics_target
---------------------------
100

PostgreSQL 8.3:

version
-----------------------------------------------------------------------
PostgreSQL 8.3.14 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.3.real
(Debian 4.3.2-1.1) 4.3.2

work_mem
----------
128MB

shared_buffers
----------------
1GB

effective_cache_size
----------------------
2GB

default_statistics_target
---------------------------
100

*_cost settings are all the same (default values), and vacuum, analyze
or reindex does not make any difference.

Any suggestions, what is causes this difference, and how to resolve it
globally? Or are these examples are too special/simple, and seqscan plan
wins in 9.3?

# explain analyze select bid from pgbench_branches B where exists
(select bid from pgbench_accounts A where A.bid=B.bid limit 1);
-------------------------------------------------------------------
Seq Scan on pgbench_branches b (cost=0.00..3.99 rows=5 width=4)
(actual time=0.028..379.426 rows=10 loops=1)
Filter: (SubPlan 1)
SubPlan 1
-> Limit (cost=0.00..0.29 rows=1 width=4) (actual
time=37.934..37.934 rows=1 loops=10)
-> Seq Scan on pgbench_accounts a (cost=0.00..28894.00
rows=100000 width=4) (actual time=37.927..37.927 rows=1 loops=10)
Filter: (bid = b.bid)
Rows Removed by Filter: 450000
Total runtime: 379.464 ms

# set enable_seqscan=off;

# explain analyze select bid from pgbench_branches B where exists
(select bid from pgbench_accounts A where A.bid=B.bid limit 1);
-------------------------------------------------------------------
Index Only Scan using pgbench_branches_pkey on pgbench_branches b
(cost=0.14..12.82 rows=5 width=4) (actual time=0.039..0.145 rows=10 loops=1)
Filter: (SubPlan 1)
Heap Fetches: 0
SubPlan 1
-> Limit (cost=0.42..0.45 rows=1 width=4) (actual
time=0.012..0.012 rows=1 loops=10)
-> Index Only Scan using pgbench_accounts_bid on
pgbench_accounts a (cost=0.42..2850.43 rows=100000 width=4) (actual
time=0.011..0.011 rows=1 loops=10)
Index Cond: (bid = b.bid)
Heap Fetches: 0
Total runtime: 0.178 ms

Thanks,
József Kószó

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Jeff Frost 2014-03-18 00:48:36 Re: BUG #9606: pg_class relhaspkey column not updated on removal of primary key
Previous Message Venkata Balaji Nagothi 2014-03-17 23:21:10 Re: BUG #9604: Unable to access table remotely