Getting rid of a seq scan in query on a large table

From: Jens Hoffrichter <jens(dot)hoffrichter(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Getting rid of a seq scan in query on a large table
Date: 2011-06-27 12:46:46
Message-ID: BANLkTik0KVOg8itw_ud-s8_TtK9kzt-O=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi everyone,

I'm having trouble getting rid of a sequential scan on a table with roughly
120k entries it. Creation of an index on that particular column which
triggers the sequential scan doesn't do anything, VACUUM and ANALYZE has
been done on the table.

The table in question has the following definition:

Column | Type |
Modifiers
--------------------+--------------------------+------------------------------------------------------------------
post_id | bigint | not null default
nextval('posts_post_id_seq'::regclass)
forum_id | bigint | not null
threadlink | character varying(255) | not null
timestamp | timestamp with time zone | not null
poster_id | bigint |
thread_id | bigint | not null
subject | text | not null
text | text | not null
postername | character varying(255) |
internal_post_id | bigint | not null default
nextval('posts_internal_post_id_seq'::regclass)
internal_thread_id | bigint |
Indexes:
"posts_pkey" PRIMARY KEY, btree (internal_post_id)
"posts_forum_id_key" UNIQUE, btree (forum_id, post_id)
"idx_internal_thread_id" btree (internal_thread_id)
"idx_posts_poster_id" btree (poster_id)
Foreign-key constraints:
"posts_forum_id_fkey" FOREIGN KEY (forum_id) REFERENCES forums(forum_id)
"posts_internal_thread_id_fkey" FOREIGN KEY (internal_thread_id)
REFERENCES threads(internal_thread_id)
"posts_poster_id_fkey" FOREIGN KEY (poster_id) REFERENCES
posters(poster_id)

The query is this:

SELECT threads.internal_thread_id AS threads_internal_thread_id,
threads.forum_id AS threads_forum_id, threads.thread_id AS
threads_thread_id, threads.title AS threads_title, threads.poster_id AS
threads_poster_id, threads.postername AS threads_postername,
threads.category AS threads_category, threads.posttype AS threads_posttype

FROM threads JOIN posts ON threads.internal_thread_id =
posts.internal_thread_id JOIN posters ON posts.poster_id = posters.poster_id
JOIN posters_groups AS posters_groups_1 ON posters.poster_id =
posters_groups_1.poster_id JOIN groups ON groups.group_id =
posters_groups_1.group_id WHERE groups.group_id = 4 ORDER BY posts.timestamp
DESC;

The query plan (with an explain analyze) gives me the following:

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=13995.93..14006.63 rows=4279 width=108) (actual
time=79.927..79.947 rows=165 loops=1)
Sort Key: posts."timestamp"
Sort Method: quicksort Memory: 50kB
-> Nested Loop (cost=6.97..13737.84 rows=4279 width=108) (actual
time=0.605..79.693 rows=165 loops=1)
-> Seq Scan on groups (cost=0.00..1.05 rows=1 width=8) (actual
time=0.013..0.014 rows=1 loops=1)
Filter: (group_id = 4)
-> Nested Loop (cost=6.97..13694.00 rows=4279 width=116) (actual
time=0.587..79.616 rows=165 loops=1)
-> Hash Join (cost=6.97..12343.10 rows=4279 width=24)
(actual time=0.568..78.230 rows=165 loops=1)
Hash Cond: (posts.poster_id = posters.poster_id)
-> Seq Scan on posts (cost=0.00..11862.12 rows=112312
width=24) (actual time=0.019..60.092 rows=112312 loops=1)
-> Hash (cost=6.79..6.79 rows=14 width=24) (actual
time=0.101..0.101 rows=14 loops=1)
-> Hash Join (cost=2.14..6.79 rows=14 width=24)
(actual time=0.060..0.093 rows=14 loops=1)
Hash Cond: (posters.poster_id =
posters_groups_1.poster_id)
-> Seq Scan on posters (cost=0.00..3.83
rows=183 width=8) (actual time=0.006..0.023 rows=185 loops=1)
-> Hash (cost=1.96..1.96 rows=14
width=16) (actual time=0.025..0.025 rows=14 loops=1)
-> Seq Scan on posters_groups
posters_groups_1 (cost=0.00..1.96 rows=14 width=16) (actual
time=0.016..0.021 rows=14 loops=1)
Filter: (group_id = 4)
-> Index Scan using threads_pkey on threads
(cost=0.00..0.30 rows=1 width=100) (actual time=0.006..0.007 rows=1
loops=165)
Index Cond: (threads.internal_thread_id =
posts.internal_thread_id)
Total runtime: 80.137 ms
(20 rows)

So the big time lost is in this line:

Seq Scan on posts (cost=0.00..11862.12 rows=112312 width=24) (actual
time=0.019..60.092 rows=112312 loops=1)

which I can understand why it slow ;)

But I haven't yet managed to convert the Seq Scan into an Index Scan, and
I'm not sure how to continue there.

As I am not a big expert on psql optimization, any input would be greatly
appreciated.

Best regards,
Jens

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jenish 2011-06-27 14:22:58 Performance issue with Insert
Previous Message Greg Smith 2011-06-24 23:26:31 Re: Cost of creating an emply WAL segment