Shouldn't we have a way to avoid "risky" plans?

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Shouldn't we have a way to avoid "risky" plans?
Date: 2011-03-23 17:12:18
Message-ID: 4D8A29F2.9080105@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Folks,

Yet more evidence that we need some way to assess query plans which are
high-risk and avoid them (or have Yet Another GUC):

Merge Join (cost=29.16..1648.00 rows=382 width=78) (actual
time=57215.167..57215.216 rows=1 loops=1)
Merge Cond: (rn.node_id = device_nodes.node_id)
-> Nested Loop (cost=0.00..11301882.40 rows=6998 width=62) (actual
time=57209.291..57215.030 rows=112 loops=1)
Join Filter: (node_ep.node_id = rn.node_id)
-> Nested Loop (cost=0.00..11003966.85 rows=90276 width=46)
(actual time=0.027..52792.422 rows=90195 loops=1)
-> Index Scan using ix_ne_ns on node_ep
(cost=0.00..1545943.45 rows=32606992 width=26) (actual
time=0.010..7787.043 rows=32606903 loops=1)
-> Index Scan using ix_nefp_eid on ep_fp
(cost=0.00..0.28 rows=1 width=20) (actual time=0.001..0.001 rows=0
loops=32606903)
Index Cond: (ep_fp.ep_id = node_ep.ep_id)
-> Materialize (cost=0.00..5.30 rows=220 width=16) (actual
time=0.000..0.019 rows=220 loops=90195)
-> Seq Scan on mytable rn (cost=0.00..4.20 rows=220
width=16) (actual time=0.008..0.043 rows=220 loops=1)
-> Sort (cost=28.18..28.21 rows=12 width=16) (actual
time=0.164..0.165 rows=10 loops=1)
Sort Key: device_nodes.node_id
Sort Method: quicksort Memory: 25kB
-> Index Scan using ix_dn_did on device_nodes
(cost=0.00..27.96 rows=12 width=16) (actual time=0.086..0.134 rows=10
loops=1)
Index Cond: (dev_id = 18165)
Total runtime: 57215.329 ms

AFAICT, what's happening in this query is that PostgreSQL's statistics
on the device_nodes and several other tables are slightly out of date
(as in 5% of the table). Thus it thinks that nothing will match the
list of node_ids in "mytable", and that it can exit the merge join early
and ignore the whole huge cost of the join plan. This particular form
of out-of-dateness will be fixed in 9.1 (it's due to values being higher
than the highest histogram bucket in pg_stat), but not all forms will be.

It really seems like we should be able to detect an obvious high-risk
situation like this one. Or maybe we're just being too optimistic about
discarding subplans?

BTW, the optimal plan for this query (post-analyze) is this one:

Nested Loop (cost=0.00..213068.26 rows=12 width=78) (actual
time=0.374..0.514 rows=1 loops=1)
Join Filter: (device_nodes.node_id = rn.node_id)
-> Seq Scan on mytable rn (cost=0.00..4.20 rows=220 width=16)
(actual time=0.013..0.050 rows=220 loops=1)
-> Materialize (cost=0.00..213024.49 rows=12 width=62) (actual
time=0.001..0.002 rows=1 loops=220)
-> Nested Loop (cost=0.00..213024.43 rows=12 width=62)
(actual time=0.077..0.278 rows=1 loops=1)
-> Nested Loop (cost=0.00..211740.04 rows=4428
width=42) (actual time=0.070..0.269 rows=1 loops=1)
-> Index Scan using ix_dn_did on device_nodes
(cost=0.00..51.92 rows=13 width=16) (actual time=0.058..0.115 rows=10
loops=1)
Index Cond: (dev_id = 18165)
-> Index Scan using ix_ne_ns on node_ep
(cost=0.00..16137.45 rows=11700 width=26) (actual time=0.014..0.014
rows=0 loops=10)
Index Cond: (node_ep.node_id =
device_nodes.node_id)
-> Index Scan using ix_nefp_eid on ep_fp
(cost=0.00..0.28 rows=1 width=20) (actual time=0.006..0.007 rows=1 loops=1)
Index Cond: (ep_fp.ep_id = node_ep.ep_id);

-- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Claudio Freire 2011-03-23 17:35:55 Re: Shouldn't we have a way to avoid "risky" plans?
Previous Message Uwe Bartels 2011-03-23 16:16:17 Re: buffercache/bgwriter