Re: Strange workaround for slow query

From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: sverhagen(at)wps-nl(dot)com
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Strange workaround for slow query
Date: 2010-03-10 11:04:51
Message-ID: 4B977CD3.5030402@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

sverhagen(at)wps-nl(dot)com wrote:
>
> > Thanks - I'm sorry that I was not more specific earlier, but what would
> > be *really* helpful is the output of explain analyze, since that also
> > shows actual time, # rows and # loops of the inner nestloop.
>
> No problem at all.
>
>
> EXPLAIN ANALYZE SELECT * FROM events_events LEFT OUTER JOIN
> events_event_types ON
> eventType_id=events_event_types.id WHERE severity=20 AND (eventType_id
> IN (71)) ORDER BY datetime DESC limit 50;
>
> QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=3.23..200.31 rows=50 width=131) (actual
> time=0.070..0.341 rows=50 loops=1)
> -> Nested Loop (cost=3.23..49139.16 rows=12466 width=131) (actual
> time=0.069..0.309 rows=50 loops=1)
> -> Index Scan Backward using
> events_events_eventtype_id_datetime_ind on events_events
> (cost=0.00..48886.61 rows=12466 width=93) (actual time=0.037..0.144
> rows=50 loops=1)
> Index Cond: (eventtype_id = 71)
> -> Materialize (cost=3.23..3.24 rows=1 width=38) (actual
> time=0.001..0.001 rows=1 loops=50)
> -> Seq Scan on events_event_types (cost=0.00..3.23
> rows=1 width=38) (actual time=0.024..0.029 rows=1 loops=1)
> Filter: ((id = 71) AND (severity = 20))
> Total runtime: 0.415 ms
> (8 rows)
>
> Time: 1.290 ms
>
>
> EXPLAIN ANALYZE SELECT * FROM events_events LEFT OUTER JOIN
> events_event_types ON
> eventType_id=events_event_types.id WHERE severity=70 AND (eventType_id
> IN (71)) ORDER BY datetime DESC limit 50;
>
> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=3.23..200.31 rows=50 width=131) (actual
> time=11641.775..11641.775 rows=0 loops=1)
> -> Nested Loop (cost=3.23..49139.16 rows=12466 width=131) (actual
> time=11641.773..11641.773 rows=0 loops=1)
> -> Index Scan Backward using
> events_events_eventtype_id_datetime_ind on events_events
> (cost=0.00..48886.61 rows=12466 width=93) (actual
> time=0.035..11573.320 rows=50389 loops=1)
> Index Cond: (eventtype_id = 71)
> -> Materialize (cost=3.23..3.24 rows=1 width=38) (actual
> time=0.000..0.000 rows=0 loops=50389)
> -> Seq Scan on events_event_types (cost=0.00..3.23
> rows=1 width=38) (actual time=0.028..0.028 rows=0 loops=1)
> Filter: ((id = 71) AND (severity = 70))
> Total runtime: 11641.839 ms
> (8 rows)
>
> Time: 11642.902 ms
>
>
> EXPLAIN ANALYZE SELECT * FROM events_events LEFT OUTER JOIN
> events_event_types ON
> eventType_id=events_event_types.id WHERE severity=70 AND (eventType_id
> IN (71, 999)) ORDER BY datetime DESC LIMIT 50;
>
> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=27290.26..27290.38 rows=50 width=131) (actual
> time=0.118..0.118 rows=0 loops=1)
> -> Sort (cost=27290.26..27303.17 rows=5164 width=131) (actual
> time=0.117..0.117 rows=0 loops=1)
> Sort Key: events_events.datetime
> Sort Method: quicksort Memory: 17kB
> -> Nested Loop (cost=22.95..27118.71 rows=5164 width=131)
> (actual time=0.112..0.112 rows=0 loops=1)
> -> Seq Scan on events_event_types (cost=0.00..3.02
> rows=17 width=38) (actual time=0.016..0.041 rows=16 loops=1)
> Filter: (severity = 70)
> -> Bitmap Heap Scan on events_events
> (cost=22.95..1589.94 rows=408 width=93) (actual time=0.002..0.002
> rows=0 loops=16)
> Recheck Cond: ((events_events.eventtype_id = ANY
> ('{71,999}'::bigint[])) AND (events_events.eventtype_id =
> events_event_types.id))
> -> Bitmap Index Scan on
> test_events_events_eventtype_id_severity_ind (cost=0.00..22.85
> rows=408 width=0) (actual time=0.001..0.001 rows=0 loops=16)
> Index Cond: ((events_events.eventtype_id =
> ANY ('{71,999}'::bigint[])) AND (events_events.eventtype_id =
> events_event_types.id))
> Total runtime: 0.179 ms
> (12 rows)
>
> Time: 1.510 ms
>
>
> > I'm
> > wondering though why you do a left outer join. From the \d output in
> the
> > previous mail, events_event.eventtype_id has a not null constraint
> and a
> > fk to events_event_types.id, so an inner join would be appropriate.
> > Outer joins limits the amount of join orders the planner considers,
> so a
> > better plan might arise when the join is changed to inner.
>
> I do agree with this assessment. I'm sort of improving the performance
> of an existing implementation of ours, for which I'm not aware why
> they chose for LEFT OUTER. I did, however, test things with INNER as
> well, with the same results, so I decided to stick with what I
> encountered in the existing implementation. But it's on my mind as
> well ;-)
>
Thanks for the formatted output. The difference in speed is caused by
the first query that has to read 50k rows from an index, with filter
expression is only eventtype_id = 71, and the second has the extra
knowledge from the scan of events_event_type in the nestloops outer
loop, which returns 0 rows in all cases and is hence a lot faster, even
though that scan is called 16 times.

But the big question is: why does the planner chooses plan 1 in the
first case, or how to fix that? My $0,02 would be to 'help' the planner
find a better plan. Ofcourse you did ANALYZE, but maybe another idea is
to increase the default_statistics_target if it is still at the default
value of 10. More info on
http://www.postgresql.org/docs/8.3/static/runtime-config-query.html. And
also to 'help' the planner: I'd just change the query to an inner join
in this case, since there cannot be null tuples in the right hand side here.

regards,
Yeb Havinga

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Benoit Delbosc 2010-03-10 13:26:20 Bad query plan inside EXISTS clause
Previous Message sverhagen 2010-03-10 10:35:18 Re: Strange workaround for slow query