Index Scan/Bitmap Index Scan for queries with FTS and ORDER+LIMIT

From: Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>
To: "pgsql-general(at)postgresql(dot)org >> PG-General Mailing List" <pgsql-general(at)postgresql(dot)org>
Subject: Index Scan/Bitmap Index Scan for queries with FTS and ORDER+LIMIT
Date: 2009-10-21 12:48:12
Message-ID: c3a7de1f0910210548v6b971f13rf8ccdad9cba1e7c0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi, All

Well what do we have?

8 core, 32 GB, RAID 10, CentOS 5.2, Pg 8.3

A query using tsearch in WHERE block with ORDER and LIMIT:
select * from test_table where obj_tsvector @@
make_tsquery('some_words', 'utf8_russian') and obj_status_did = 1
order by obj_created desc limit 10;

Two indexes - one for FTS, another for simple scan:
"i_test_table__tsvector_1" gist (obj_tsvector) WHERE obj_status_did = 1
"i_test_table__created" btree (obj_created) WHERE obj_status_did = 1

It's clear that using i_test_table__created index is better when
querying words occurring frequently in test_table and
i_test_table__tsvector_1 in opposite case. But with enable_indexscan
turned on planner force to use i_test_table__created that is worth for
querying sparce words.

Is there a way (or workaround) to make it use i_test_table__created
for frequent and i_test_table__tsvector_1 for sparse words? May be
some PG internal that would give us a hint whether the words is
frequent or not?

Here is a test that reflects the problem:

test_db=# set enable_indexscan to on;
SET
test_db=# explain analyze select * from test_table where obj_tsvector
@@ make_tsquery('frequent_words', 'utf8_russian') and obj_status_did =
1 order by obj_created desc limit 10;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..2843.83 rows=10 width=621) (actual
time=0.830..6.360 rows=10 loops=1)
-> Index Scan Backward using i_test_table__created on test_table
(cost=0.00..2235820.48 rows=7862 width=621) (actual time=0.829..6.355
rows=10 loops=1)
Filter: (obj_tsvector @@ '''frequent_words'''::tsquery)
Total runtime: 6.407 ms
(4 rows)

test_db=#
test_db=# set enable_indexscan to off;
SET
test_db=# explain analyze select * from test_table where obj_tsvector
@@ make_tsquery('frequent_words', 'utf8_russian') and obj_status_did =
1 order by obj_created desc limit 10;
ERROR: canceling statement due to statement timeout

test_db=#
test_db=# set enable_indexscan to on;
SET
test_db=# explain analyze select * from test_table where obj_tsvector
@@ make_tsquery('sparse_words', 'utf8_russian') and obj_status_did = 1
order by obj_created desc limit 10;
ERROR: canceling statement due to statement timeout

test_db=#
test_db=# set enable_indexscan to off;
SET
test_db=# explain analyze select * from test_table where obj_tsvector
@@ make_tsquery('sparse_words', 'utf8_russian') and obj_status_did = 1
order by obj_created desc limit 10;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=29038.86..29038.89 rows=10 width=621) (actual
time=344.218..344.223 rows=10 loops=1)
-> Sort (cost=29038.86..29058.52 rows=7862 width=621) (actual
time=344.217..344.220 rows=10 loops=1)
Sort Key: obj_created
Sort Method: top-N heapsort Memory: 43kB
-> Bitmap Heap Scan on test_table (cost=469.20..28868.97
rows=7862 width=621) (actual time=292.314..344.176 rows=21 loops=1)
Recheck Cond: (obj_status_did = 1)
Filter: (obj_tsvector @@ '''sparse_words'''::tsquery)
-> Bitmap Index Scan on i_test_table__tsvector_1
(cost=0.00..467.23 rows=7862 width=0) (actual time=290.202..290.202
rows=2208 loops=1)
Index Cond: (obj_tsvector @@ '''sparse_words'''::tsquery)
Total runtime: 344.289 ms
(10 rows)

--
Regards,
Sergey Konoplev

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Gaini Rajeshwar 2009-10-21 13:12:49 How to use Logical Operators in Fulltext Search?
Previous Message Pavel Stehule 2009-10-21 09:23:05 Re: How to send multiple parameters to a pl/pgsql function