Quick Links

Re: Bad plan for nested loop + limit

From:	Alexander Staubo <alex(at)bengler(dot)no>
To:	David Wilson <david(dot)t(dot)wilson(at)gmail(dot)com>
Cc:	pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Bad plan for nested loop + limit
Date:	2009-02-27 20:18:42
Message-ID:	88daf38c0902271218p1931ec9dyfdc269bceab24190@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Sun, Feb 15, 2009 at 5:45 PM, Alexander Staubo <alex(at)bengler(dot)no> wrote:
> On Sun, Feb 15, 2009 at 5:29 AM, David Wilson <david(dot)t(dot)wilson(at)gmail(dot)com> wrote:
>> On Sat, Feb 14, 2009 at 5:25 PM, Alexander Staubo <alex(at)bengler(dot)no> wrote:
>>>
>>> Output from "explain analyze":
>>>
>>> Limit (cost=0.00..973.63 rows=4 width=48) (actual
>>> time=61.554..4039.704 rows=1 loops=1)
>>> -> Nested Loop (cost=0.00..70101.65 rows=288 width=48) (actual
>>> time=61.552..4039.700 rows=1 loops=1)
>>> -> Nested Loop (cost=0.00..68247.77 rows=297 width=52)
>>> (actual time=61.535..4039.682 rows=1 loops=1)
>>
>> Those estimates are pretty far off. Did you try increasing the
>> statistics target? Also, is the first query repeatable (that is, is it
>> already in cache when you do the test, or alternately, are all queries
>> *out* of cache when you test?)

All right, this query keeps coming back to bite me. If this part of the join:

... and section_items.sandbox_id = 16399

yields a sufficiently large number of matches, then performance goes
'boink', like so:

Limit (cost=0.00..34.86 rows=4 width=48) (actual
time=4348.696..4348.696 rows=0 loops=1)
-> Nested Loop (cost=0.00..60521.56 rows=6944 width=48) (actual
time=4348.695..4348.695 rows=0 loops=1)
-> Index Scan using index_event_occurrences_on_start_time on
event_occurrences (cost=0.00..11965.38 rows=145712 width=48) (actual
time=0.093..138.029 rows=145108 loops=1)
Index Cond: (start_time > '2009-02-27
18:01:14.739411+01'::timestamp with time zone)
-> Index Scan using
index_section_items_on_subject_type_and_subject_id on section_items
(cost=0.00..0.32 rows=1 width=4) (actual time=0.029..0.029 rows=0
loops=145108)
Index Cond: (((section_items.subject_type)::text =
'Event'::text) AND (section_items.subject_id =
event_occurrences.event_id))
Filter: (section_items.sandbox_id = 9)
Total runtime: 4348.777 ms

In this case:

# select count(*) from section_items where sandbox_id = 9;
count
-------
3126

If I remove the start_time > ... clause, performance is fine. Upping
the statistics setting on any of the columns involved seems to have no
effect.

Is this a pathological border case, or is there something I can do to
*generally* make this query run fast? Keep in mind that the query
itself returns no rows at all. I want to avoid doing an initial
"select count(...)" just to avoid the bad plan. Suffice to say, having
a web request take 5 seconds is asking too much from our users.

Alexander.

In response to

Re: Bad plan for nested loop + limit at 2009-02-15 16:45:42 from Alexander Staubo

Responses

Re: Bad plan for nested loop + limit at 2009-02-27 22:54:47 from Robert Haas

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Robert Haas	2009-02-27 22:54:47	Re: Bad plan for nested loop + limit
Previous Message	Scott Marlowe	2009-02-26 19:16:54	Re: Abnormal performance difference between Postgres and MySQL