Forcing postgresql to use an index

From: Eugene Morozov <eugene(at)cactus-mouse(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Forcing postgresql to use an index
Date: 2009-09-08 14:12:21
Message-ID: 87ocpl7cfu.fsf@eugenemorozov.name
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello,

I have a following query (autogenerated by Django)

SELECT activity_activityevent.id, activity_activityevent.user_id, activity_activityevent.added_on
FROM activity_activityevent
WHERE activity_activityevent.user_id IN (
SELECT U0.user_id
FROM profile U0
INNER JOIN profile_friends U1
ON U0.user_id = U1.to_profile_id
WHERE U1.from_profile_id = 5
)
ORDER BY activity_activityevent.added_on DESC LIMIT 10

When I run EXPLAIN ANALYZE with my default settings (seq scan is on,
random_page_cost = 4) I get the following result:

Limit (cost=4815.62..4815.65 rows=10 width=202) (actual time=332.938..332.977 rows=10 loops=1)
-> Sort (cost=4815.62..4816.35 rows=292 width=202) (actual time=332.931..332.945 rows=10 loops=1)
Sort Key: activity_activityevent.added_on
Sort Method: top-N heapsort Memory: 19kB
-> Hash IN Join (cost=2204.80..4809.31 rows=292 width=202) (actual time=12.856..283.916 rows=15702 loops=1)
Hash Cond: (activity_activityevent.user_id = u0.user_id)
-> Seq Scan on activity_activityevent (cost=0.00..2370.43 rows=61643 width=202) (actual time=0.020..126.129 rows=61643 loops=1)
-> Hash (cost=2200.05..2200.05 rows=380 width=8) (actual time=12.777..12.777 rows=424 loops=1)
-> Nested Loop (cost=11.20..2200.05 rows=380 width=8) (actual time=0.260..11.594 rows=424 loops=1)
-> Bitmap Heap Scan on profile_friends u1 (cost=11.20..62.95 rows=380 width=4) (actual time=0.228..1.202 rows=424 loops=1)
Recheck Cond: (from_profile_id = 5)
-> Bitmap Index Scan on profile_friends_from_profile_id_key (cost=0.00..11.10 rows=380 width=0) (actual time=0.208..0.208 rows=424 loops=1)
Index Cond: (from_profile_id = 5)
-> Index Scan using profile_pkey on profile u0 (cost=0.00..5.61 rows=1 width=4) (actual time=0.012..0.015 rows=1 loops=424)
Index Cond: (u0.user_id = u1.to_profile_id)
Total runtime: 333.190 ms

But when I disable seq scan or set random_page_cost to 1.2 (higher
values doesn't change the plan), postgres starts using index and query
runs two times faster:

Limit (cost=9528.36..9528.38 rows=10 width=202) (actual time=165.047..165.090 rows=10 loops=1)
-> Sort (cost=9528.36..9529.09 rows=292 width=202) (actual time=165.042..165.058 rows=10 loops=1)
Sort Key: activity_activityevent.added_on
Sort Method: top-N heapsort Memory: 19kB
-> Nested Loop (cost=2201.00..9522.05 rows=292 width=202) (actual time=13.074..126.209 rows=15702 loops=1)
-> HashAggregate (cost=2201.00..2204.80 rows=380 width=8) (actual time=12.996..14.131 rows=424 loops=1)
-> Nested Loop (cost=11.20..2200.05 rows=380 width=8) (actual time=0.263..11.665 rows=424 loops=1)
-> Bitmap Heap Scan on profile_friends u1 (cost=11.20..62.95 rows=380 width=4) (actual time=0.232..1.181 rows=424 loops=1)
Recheck Cond: (from_profile_id = 5)
-> Bitmap Index Scan on profile_friends_from_profile_id_key (cost=0.00..11.10 rows=380 width=0) (actual time=0.210..0.210 rows=424 loops=1)
Index Cond: (from_profile_id = 5)
-> Index Scan using profile_pkey on profile u0 (cost=0.00..5.61 rows=1 width=4) (actual time=0.013..0.016 rows=1 loops=424)
Index Cond: (u0.user_id = u1.to_profile_id)
-> Index Scan using activity_activityevent_user_id on activity_activityevent (cost=0.00..18.82 rows=35 width=202) (actual time=0.014..0.130 rows=37 loops=424)
Index Cond: (activity_activityevent.user_id = u0.user_id)
Total runtime: 165.323 ms

Can anyone enlighten me? Should I set random_page_cost to 1.2
permanently (I feel this is not a really good idea in my case)?

Eugene

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Craig James 2009-09-08 14:30:12 Re: Using Gprof with Postgresql
Previous Message Pierre Frédéric Caillaud 2009-09-08 12:44:51 Re: Using Gprof with Postgresql