Adding "LIMIT 1" kills performance.

From: Chris Shoemaker <chris(dot)shoemaker(at)cox(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Cc: ejones(at)engineyard(dot)com
Subject: Adding "LIMIT 1" kills performance.
Date: 2008-05-29 15:47:34
Message-ID: 20080529154734.GA18273@pe.Belkin
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

[Attn list-queue maintainers: Please drop the earlier version
of this email that I accidentally sent from an unsubscribed address. ]

Hi,

I'm having a strange problem with a slow-running select query. The
query I use in production ends in "LIMIT 1", and it runs very slowly.
But when I remove the "LIMIT 1", the query runs quite quickly. This
behavior has stumped a couple smart DBAs.

The full queries and EXPLAIN ANALYZE plans are included below, but by
way of explanation/observation:

1) The "LIMIT 1" case will sometimes be quicker (but still much slower
than the non-"LIMIT 1" case) for different values of
calendar_group_id.

2) The query below is a slightly simplified version of the one I
actually use. The real one includes more conditions which explain why
each table is joined. For reference, the original query is quoted at
the end [1]. The original query exhibits the same behavior as the
simplified versions w.r.t. the "LIMIT 1" case taking _much_ longer
(even longer than the simplified version) than the non-"LIMIT 1" case,
and uses the same plans.

Can anyone explain why such a slow plan is chosen when the "LIMIT 1"
is present? Is there anything I can do to speed this query up?
Thanks.

-chris

production=> select version();
version
------------------------------------------------------------------------------
PostgreSQL 8.2.6 on x86_64-pc-linux-gnu, compiled by GCC x86_64-pc-linux-gnu-gcc (GCC) 4.1.2 (Gentoo 4.1.2 p1.0.2)
(1 row)

production=> analyze calendar_groups;
ANALYZE
production=> analyze calendar_links;
ANALYZE
production=> analyze calendars;
ANALYZE
production=> analyze event_updates;
ANALYZE
production=> EXPLAIN ANALYZE SELECT event_updates.*
FROM event_updates
INNER JOIN calendars ON event_updates.feed_id = calendars.id
INNER JOIN calendar_links ON calendars.id = calendar_links.source_tracker_id
WHERE (calendar_links.calendar_group_id = 3640)
ORDER BY event_updates.id DESC
LIMIT 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Limit (cost=16.55..91.73 rows=1 width=2752) (actual time=27810.058..27810.059 rows=1 loops=1)
-> Nested Loop (cost=16.55..695694.18 rows=9254 width=2752) (actual time=27810.054..27810.054 rows=1 loops=1)
Join Filter: (event_updates.feed_id = calendars.id)
-> Index Scan Backward using event_updates_pkey on event_updates (cost=0.00..494429.30 rows=8944370 width=2752) (actual time=0.030..7452.142 rows=5135706 loops=1)
-> Materialize (cost=16.55..16.56 rows=1 width=8) (actual time=0.001..0.002 rows=1 loops=5135706)
-> Nested Loop (cost=0.00..16.55 rows=1 width=8) (actual time=0.029..0.034 rows=1 loops=1)
-> Index Scan using index_calendar_links_on_calendar_group_id_and_source_tracker_id on calendar_links (cost=0.00..8.27 rows=1 width=4) (actual time=0.012..0.013 rows=1 loops=1)
Index Cond: (calendar_group_id = 3640)
-> Index Scan using harvest_trackers_pkey on calendars (cost=0.00..8.27 rows=1 width=4) (actual time=0.012..0.013 rows=1 loops=1)
Index Cond: (calendars.id = calendar_links.source_tracker_id)
Total runtime: 27810.161 ms
(11 rows)

production=> EXPLAIN ANALYZE SELECT event_updates.* FROM event_updates
INNER JOIN calendars ON event_updates.feed_id = calendars.id
INNER JOIN calendar_links ON calendars.id = calendar_links.source_tracker_id
WHERE (calendar_links.calendar_group_id = 3640)
ORDER BY event_updates.id DESC;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Sort (cost=43376.36..43399.50 rows=9256 width=2752) (actual time=10.178..10.205 rows=36 loops=1)
Sort Key: event_updates.id
-> Nested Loop (cost=249.86..31755.56 rows=9256 width=2752) (actual time=9.957..10.098 rows=36 loops=1)
-> Nested Loop (cost=0.00..16.55 rows=1 width=8) (actual time=9.868..9.873 rows=1 loops=1)
-> Index Scan using index_calendar_links_on_calendar_group_id_and_source_tracker_id on calendar_links (cost=0.00..8.27 rows=1 width=4) (actual time=9.824..9.825 rows=1 loops=1)
Index Cond: (calendar_group_id = 3640)
-> Index Scan using harvest_trackers_pkey on calendars (cost=0.00..8.27 rows=1 width=4) (actual time=0.034..0.036 rows=1 loops=1)
Index Cond: (calendars.id = calendar_links.source_tracker_id)
-> Bitmap Heap Scan on event_updates (cost=249.86..31623.01 rows=9280 width=2752) (actual time=0.080..0.138 rows=36 loops=1)
Recheck Cond: (event_updates.feed_id = calendars.id)
-> Bitmap Index Scan on index_event_updates_on_feed_id_and_feed_type (cost=0.00..247.54 rows=9280 width=0) (actual time=0.056..0.056 rows=36 loops=1)
Index Cond: (event_updates.feed_id = calendars.id)
Total runtime: 10.337 ms
(13 rows)

---------
[1] The original, unsimplified query:
SELECT event_updates.* FROM event_updates
INNER JOIN calendars ON (event_updates.feed_id = calendars.id AND event_updates.feed_type = E'Calendar')
INNER JOIN calendar_links ON (calendars.id = calendar_links.source_tracker_id AND calendars.type = E'SourceTracker')
WHERE (calendar_links.calendar_group_id = 3640 AND calendars.deactivated_at IS NULL)
ORDER BY event_updates.id DESC
LIMIT 1

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Chris Browne 2008-05-29 16:46:39 OVERLAPS is slow
Previous Message Joshua D. Drake 2008-05-29 15:45:14 Re: 2GB or not 2GB