Re: Bad plan for nested loop + limit

From: Alexander Staubo <alex(at)bengler(dot)no>
To: David Wilson <david(dot)t(dot)wilson(at)gmail(dot)com>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Bad plan for nested loop + limit
Date: 2009-02-27 20:18:42
Message-ID: 88daf38c0902271218p1931ec9dyfdc269bceab24190@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sun, Feb 15, 2009 at 5:45 PM, Alexander Staubo <alex(at)bengler(dot)no> wrote:
> On Sun, Feb 15, 2009 at 5:29 AM, David Wilson <david(dot)t(dot)wilson(at)gmail(dot)com> wrote:
>> On Sat, Feb 14, 2009 at 5:25 PM, Alexander Staubo <alex(at)bengler(dot)no> wrote:
>>>
>>> Output from "explain analyze":
>>>
>>>  Limit  (cost=0.00..973.63 rows=4 width=48) (actual
>>> time=61.554..4039.704 rows=1 loops=1)
>>>   ->  Nested Loop  (cost=0.00..70101.65 rows=288 width=48) (actual
>>> time=61.552..4039.700 rows=1 loops=1)
>>>         ->  Nested Loop  (cost=0.00..68247.77 rows=297 width=52)
>>> (actual time=61.535..4039.682 rows=1 loops=1)
>>
>> Those estimates are pretty far off. Did you try increasing the
>> statistics target? Also, is the first query repeatable (that is, is it
>> already in cache when you do the test, or alternately, are all queries
>> *out* of cache when you test?)

All right, this query keeps coming back to bite me. If this part of the join:

... and section_items.sandbox_id = 16399

yields a sufficiently large number of matches, then performance goes
'boink', like so:

Limit (cost=0.00..34.86 rows=4 width=48) (actual
time=4348.696..4348.696 rows=0 loops=1)
-> Nested Loop (cost=0.00..60521.56 rows=6944 width=48) (actual
time=4348.695..4348.695 rows=0 loops=1)
-> Index Scan using index_event_occurrences_on_start_time on
event_occurrences (cost=0.00..11965.38 rows=145712 width=48) (actual
time=0.093..138.029 rows=145108 loops=1)
Index Cond: (start_time > '2009-02-27
18:01:14.739411+01'::timestamp with time zone)
-> Index Scan using
index_section_items_on_subject_type_and_subject_id on section_items
(cost=0.00..0.32 rows=1 width=4) (actual time=0.029..0.029 rows=0
loops=145108)
Index Cond: (((section_items.subject_type)::text =
'Event'::text) AND (section_items.subject_id =
event_occurrences.event_id))
Filter: (section_items.sandbox_id = 9)
Total runtime: 4348.777 ms

In this case:

# select count(*) from section_items where sandbox_id = 9;
count
-------
3126

If I remove the start_time > ... clause, performance is fine. Upping
the statistics setting on any of the columns involved seems to have no
effect.

Is this a pathological border case, or is there something I can do to
*generally* make this query run fast? Keep in mind that the query
itself returns no rows at all. I want to avoid doing an initial
"select count(...)" just to avoid the bad plan. Suffice to say, having
a web request take 5 seconds is asking too much from our users.

Alexander.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2009-02-27 22:54:47 Re: Bad plan for nested loop + limit
Previous Message Scott Marlowe 2009-02-26 19:16:54 Re: Abnormal performance difference between Postgres and MySQL