Bad plan for nested loop + limit

From: Alexander Staubo <alex(at)bengler(dot)no>
To: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Bad plan for nested loop + limit
Date: 2009-02-14 22:25:05
Message-ID: 88daf38c0902141425s6972d849i9dad904f55d93c77@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

This dinky little query takes about 4 seconds to run:

select event_occurrences.*
from event_occurrences
join section_items on section_items.subject_id = event_occurrences.event_id
and section_items.subject_type = 'Event'
and section_items.sandbox_id = 16399
where event_occurrences.start_time > '2009-02-14 18:15:14.739411 +0100'
order by event_occurrences.start_time
limit 4;

Output from "explain analyze":

Limit (cost=0.00..973.63 rows=4 width=48) (actual
time=61.554..4039.704 rows=1 loops=1)
-> Nested Loop (cost=0.00..70101.65 rows=288 width=48) (actual
time=61.552..4039.700 rows=1 loops=1)
-> Nested Loop (cost=0.00..68247.77 rows=297 width=52)
(actual time=61.535..4039.682 rows=1 loops=1)
-> Index Scan using
index_event_occurrences_on_start_time on event_occurrences
(cost=0.00..13975.01 rows=159718 width=48) (actual time=0.024..398.152
rows=155197 loops=1)
Index Cond: (start_time > '2009-02-14
18:15:14.739411+01'::timestamp with time zone)
-> Index Scan using
index_section_items_on_subject_type_and_subject_id on section_items
(cost=0.00..0.33 rows=1 width=4) (actual time=0.023..0.023 rows=0
loops=155197)
Index Cond: (((section_items.subject_type)::text
= 'Event'::text) AND (section_items.subject_id =
event_occurrences.event_id))
Filter: (section_items.sandbox_id = 16399)
-> Index Scan using event_instances_pkey on events
(cost=0.00..6.23 rows=1 width=4) (actual time=0.014..0.015 rows=1
loops=1)
Index Cond: (events.id = event_occurrences.event_id)
Filter: (events.read_permissions = (-1))
Total runtime: 4039.788 ms

Now, if I use "limit 50" it uses a plan that is several orders of
magnitude more efficient:

Limit (cost=6202.38..6202.51 rows=50 width=48) (actual
time=0.170..0.171 rows=1 loops=1)
-> Sort (cost=6202.38..6203.20 rows=326 width=48) (actual
time=0.170..0.170 rows=1 loops=1)
Sort Key: event_occurrences.start_time
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=5.09..6191.55 rows=326 width=48)
(actual time=0.160..0.161 rows=1 loops=1)
-> Bitmap Heap Scan on section_items
(cost=5.09..328.94 rows=96 width=4) (actual time=0.024..0.087 rows=7
loops=1)
Recheck Cond: (sandbox_id = 16399)
Filter: ((subject_type)::text = 'Event'::text)
-> Bitmap Index Scan on
index_section_items_on_sandbox_id (cost=0.00..5.06 rows=107 width=0)
(actual time=0.018..0.018 rows=7 loops=1)
Index Cond: (sandbox_id = 16399)
-> Index Scan using
index_event_occurrences_on_event_id on event_occurrences
(cost=0.00..60.14 rows=74 width=48) (actual time=0.010..0.010 rows=0
loops=7)
Index Cond: (event_occurrences.event_id =
section_items.subject_id)
Filter: (event_occurrences.start_time >
'2009-02-14 18:15:14.739411+01'::timestamp with time zone)
Total runtime: 0.210 ms

Similarly if I disable nested joins with "set enable_nestloop = off":

Limit (cost=10900.13..10900.14 rows=4 width=48) (actual
time=191.476..191.478 rows=1 loops=1)
-> Sort (cost=10900.13..10900.95 rows=326 width=48) (actual
time=191.474..191.475 rows=1 loops=1)
Sort Key: event_occurrences.start_time
Sort Method: quicksort Memory: 25kB
-> Hash Join (cost=8944.52..10895.24 rows=326 width=48)
(actual time=162.104..191.463 rows=1 loops=1)
Hash Cond: (section_items.subject_id =
event_occurrences.event_id)
-> Bitmap Heap Scan on section_items
(cost=5.09..328.94 rows=96 width=4) (actual time=0.026..0.050 rows=7
loops=1)
Recheck Cond: (sandbox_id = 16399)
Filter: ((subject_type)::text = 'Event'::text)
-> Bitmap Index Scan on
index_section_items_on_sandbox_id (cost=0.00..5.06 rows=107 width=0)
(actual time=0.019..0.019 rows=7 loops=1)
Index Cond: (sandbox_id = 16399)
-> Hash (cost=5580.54..5580.54 rows=157752 width=48)
(actual time=161.832..161.832 rows=155197 loops=1)
-> Seq Scan on event_occurrences
(cost=0.00..5580.54 rows=157752 width=48) (actual time=0.030..75.406
rows=155197 loops=1)
Filter: (start_time > '2009-02-14
18:15:14.739411+01'::timestamp with time zone)
Total runtime: 192.496 ms

Some statistics:

# # select attname, n_distinct from pg_stats where tablename =
'event_occurrences';
attname | n_distinct
------------+------------
id | -1
created_at | -0.291615
updated_at | -0.294081
created_by | 715
updated_by | 715
event_id | 2146
start_time | -0.10047
end_time | 5602

# select attname, n_distinct from pg_stats where tablename = 'section_items';
attname | n_distinct
----------------------+------------
id | -1
created_by | 1612
created_at | -0.708649
updated_at | -0.83635
updated_by | 1190
posted_at | -0.930831
section_id | 456
sandbox_id | 455
reference | 2
subject_id | -0.546929
subject_type | 5
conversation_id | 1981
read_permissions | 8
permission_policy_id | 11

Anything I can do to fix the query?

This is PostgreSQL 8.3.5. Standard planner configs. Before testing I
reindexed, vacuumed and analyzed the tables.

Alexander.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message David Wilson 2009-02-15 04:29:52 Re: Bad plan for nested loop + limit
Previous Message Octavio Alvarez 2009-02-14 20:15:17 Re: Slow queries from information schema