Index Choice Problem

From: Adam Alkins <adam(dot)alkins(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Index Choice Problem
Date: 2006-02-16 17:03:29
Message-ID: e5edd73e0602160903y16a0bd3ao2a0dd817fdf72cc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi List,

I would like some insight from the experts here as to how I can alter
which index PostgreSQL is choosing to run a query.

First off, I'm running an active web forum (phpBB) with sometimes
hundreds of concurrent users. The query in question is one which pulls
the lists of topics in the forum. The table in question is here:

--

forums=> \d phpbb_topics;
Table "public.phpbb_topics"
Column | Type |
Modifiers
----------------------+-----------------------+-------------------------------------------------------
topic_id | integer | not null default
nextval('phpbb_topics_id_seq'::text)
forum_id | integer | not null default 0
topic_title | character varying(60) | not null default
''::character varying
topic_poster | integer | not null default 0
topic_time | integer | not null default 0
topic_views | integer | not null default 0
topic_replies | integer | not null default 0
topic_status | smallint | not null default (0)::smallint
topic_vote | smallint | not null default (0)::smallint
topic_type | smallint | not null default (0)::smallint
topic_first_post_id | integer | not null default 0
topic_last_post_id | integer | not null default 0
topic_moved_id | integer | not null default 0
topic_last_post_time | integer | not null default 0
Indexes:
"forum_id_phpbb_topics_index" btree (forum_id)
"topic_id_phpbb_topics_index" btree (topic_id)
"topic_last_post_id_phpbb_topics_index" btree (topic_last_post_id)
"topic_last_post_time_phpbb_topics_index" btree (topic_last_post_time)
"topic_moved_id_phpbb_topics_index" btree (topic_moved_id)

--

To layout the contents of the table, here are some relevant queries
showing the number of entries

forums=# SELECT COUNT(*) FROM phpbb_topics; SELECT COUNT(*) FROM
phpbb_topics WHERE forum_id = 71; SELECT COUNT(*) FROM phpbb_topics
WHERE forum_id = 55;
count
--------
190588
(1 row)

count
-------
1013
(1 row)

count
-------
35035
(1 row)

--

Ok. Now, here's the problem. I run a query to pull the list of topics
for the forum. There pagination, so the first page query looks like
this:

SELECT t.topic_id
FROM phpbb_topics AS t
WHERE t.forum_id = 71
AND t.topic_id NOT IN (205026, 29046, 144569, 59780, 187424,
138635, 184973, 170551, 22419, 181690, 197254, 205130)
ORDER BY t.topic_last_post_time DESC
LIMIT 23 OFFSET 0




QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=3487.78..3487.87 rows=34 width=8) (actual
time=1112.921..1113.005 rows=34 loops=1)
-> Sort (cost=3486.15..3489.10 rows=1181 width=8) (actual
time=1112.087..1112.535 rows=687 loops=1)
Sort Key: topic_last_post_time
-> Index Scan using forum_id_phpbb_topics_index on
phpbb_topics t (cost=0.00..3425.89 rows=1181 width=8) (actual
time=54.650..1109.877 rows=1012 loops=1)
Index Cond: (forum_id = 71)
Filter: (topic_id <> 205026)
Total runtime: 1113.268 ms
(7 rows)

--

This is the query on one of the lesser active forums (forum_id = 71)
which as list earlier only has 1013 rows. This query slow because
PostgreSQL is not using the index on the "forum_id" column, but
instead scanning through the topics via the topic_last_post_time and
filtering through the posts. This would be good for the forum_id = 55
where the most recent topics would be quickly found. Now here's the
stranger part, going deeper into the results (ie selecting pages
further down), the planner does this:

--

SELECT t.topic_id
FROM phpbb_topics AS t
WHERE t.forum_id = 71
AND t.topic_id NOT IN (205026)
ORDER BY t.topic_last_post_time DESC
LIMIT 34 OFFSET 653


QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=3487.78..3487.87 rows=34 width=8) (actual
time=6.140..6.202 rows=34 loops=1)
-> Sort (cost=3486.15..3489.10 rows=1181 width=8) (actual
time=5.306..5.753 rows=687 loops=1)
Sort Key: topic_last_post_time
-> Index Scan using forum_id_phpbb_topics_index on
phpbb_topics t (cost=0.00..3425.89 rows=1181 width=8) (actual
time=0.070..3.581 rows=1012 loops=1)
Index Cond: (forum_id = 71)
Filter: (topic_id <> 205026)
Total runtime: 6.343 ms
(7 rows)

--

This is more like how it should be done IMO. Results are much faster
when the forum id index is used. Now, the output of the first query on
the forum_id = 55 looks like this

--

SELECT t.topic_id
FROM phpbb_topics AS t
WHERE t.forum_id = 55
AND t.topic_id NOT IN (159934, 168973, 79609, 179029, 61593,
184896, 190572)
ORDER BY t.topic_last_post_time DESC
LIMIT 28 OFFSET 0


QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..50.50 rows=28 width=8) (actual time=0.060..0.714
rows=28 loops=1)
-> Index Scan Backward using
topic_last_post_time_phpbb_topics_index on phpbb_topics t
(cost=0.00..63232.38 rows=35063 width=8) (actual time=0.057..0.675
rows=28 loops=1)
Filter: ((forum_id = 55) AND (topic_id <> 159934) AND
(topic_id <> 168973) AND (topic_id <> 79609) AND (topic_id <> 179029)
AND (topic_id <> 61593) AND (topic_id <> 184896) AND (topic_id <>
190572))
Total runtime: 0.794 ms

--

This is acceptable usage when the forum_id is heavily populated. Next
now, here again puzzles me, pulling entries in the middle of forum_id
= 55

--

SELECT t.topic_id
FROM phpbb_topics AS t
WHERE t.forum_id = 55
AND t.topic_id NOT IN (159934, 168973)
ORDER BY t.topic_last_post_time DESC
LIMIT 33 OFFSET 17458

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Limit (cost=29302.43..29302.51 rows=33 width=8) (actual
time=625.907..625.969 rows=33 loops=1)
-> Sort (cost=29258.78..29346.44 rows=35064 width=8) (actual
time=603.710..615.411 rows=17491 loops=1)
Sort Key: topic_last_post_time
-> Seq Scan on phpbb_topics t (cost=0.00..26611.85
rows=35064 width=8) (actual time=0.067..528.271 rows=35034 loops=1)
Filter: ((forum_id = 55) AND (topic_id <> 159934) AND
(topic_id <> 168973))
Total runtime: 632.444 ms
(6 rows)

--

Why is it doing a sequential scan? :(

My questions... is there a method for me to suggest which index to use
in the query. I'm think adding logic in my script depending on which
forum_id is used (since I can hard code in my scripts which are the
popular forums) and tell the planner to use a specific index first?

Secondly, why in the last output did it opt to do a sequential scan
over using the forum_id index as it did earlier.

Side note, a vacuum analayze was done just prior to running these tests.

Thank you,
--
Adam Alkins
http://www.rasadam.com
Mobile: 868-680-4612

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2006-02-16 17:15:08 Re: qsort again (was Re: [PERFORM] Strange Create Index
Previous Message Patrick Carriere 2006-02-16 17:03:17 Future of Table Partitioning