sequential scan unduly favored over text search gin index

From: Sushant Sinha <sushant354(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: sequential scan unduly favored over text search gin index
Date: 2011-06-20 15:38:59
Message-ID: 1308584339.2488.12.camel@dragflick
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I have a tsvector column docvector and a gin index on it
docmeta1_docvector_idx

I have a simple query "select * from docmeta1 where docvector @@
plainto_tsquery('english', 'free');"

I find that the planner chooses a sequential scan of the table even when
the index performs orders of magnitude. I set random_page_cost = 1.0 for
the database to favor index use. However, I still see that the cost
estimate for sequential scan of the entire table (23000) is cheaper than
the cost of using the index (33000). The time taken for sequential
access is 5200 ms and for index usage is only 85 ms.

Details here:

postgres version 9.0.2
statistics on docvector is set to 10000 and as you can see the row
estimates are fine.

lawdb=# \d docmeta1
Table "public.docmeta1"
Column | Type | Modifiers
-------------+-----------+-----------
tid | integer | not null
docweight | integer |
doctype | integer |
publishdate | date |
covertids | integer[] |
titlevector | tsvector |
docvector | tsvector |
Indexes:
"docmeta1_pkey" PRIMARY KEY, btree (tid)
"docmeta1_date_idx" btree (publishdate)
"docmeta1_docvector_idx" gin (docvector)
"docmeta1_title_idx" gin (titlevector)

lawdb=# SELECT relpages, reltuples FROM pg_class WHERE relname
='docmeta1';
relpages | reltuples
----------+-----------
18951 | 329940

lawdb=# explain analyze select * from docmeta1 where docvector @@
plainto_tsquery('english', 'free');
QUERY
PLAN

--------------------------------------------------------------------------------
-----------------------------------
Seq Scan on docmeta1 (cost=0.00..23075.25 rows=35966 width=427)
(actual time=0
.145..5189.556 rows=35966 loops=1)
Filter: (docvector @@ '''free'''::tsquery)
Total runtime: 5196.231 ms
(3 rows)

lawdb=# set enable_seqscan = off;
SET
lawdb=# explain analyze select * from docmeta1 where docvector @@
plainto_tsquery('english', 'free');
QUERY
PLAN

--------------------------------------------------------------------------------
-----------------------------------------------------------
Bitmap Heap Scan on docmeta1 (cost=14096.25..33000.83 rows=35966
width=427) (a
ctual time=9.543..82.754 rows=35966 loops=1)
Recheck Cond: (docvector @@ '''free'''::tsquery)
-> Bitmap Index Scan on docmeta1_docvector_idx (cost=0.00..14087.26
rows=35
966 width=0) (actual time=8.059..8.059 rows=35967 loops=1)
Index Cond: (docvector @@ '''free'''::tsquery)
Total runtime: 85.304 ms
(5 rows)

-Sushant.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2011-06-20 15:41:16 Re: Inoptimal query plan for max() and multicolumn index
Previous Message Claudio Freire 2011-06-20 09:45:00 Re: how to know slowly query in lock postgre