full-text search doesn't fall back on sequential scan when it could

From: "James Dietrich" <jdietrch(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: full-text search doesn't fall back on sequential scan when it could
Date: 2008-07-15 20:51:30
Message-ID: 9dd1c6ea0807151351g52776137nc0fdfcab59e95b5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

I am running PostgreSQL 8.3.3 on Linux 2.6.18
jdietrch(at)io:~/z$ uname -a
Linux io.mills.lan 2.6.18-6-vserver-k7 #1 SMP Sat Jun 7 01:10:29 UTC
2008 i686 GNU/Linux

Short version of the problem:
When I run a full-text search that requires a full scan of the table,
an error is returned (GIN indexes don't support sequential scan)
instead of falling back on a sequential scan which would return
the results of the query.

Long version:
I have a database with two tables named 'one' and 'two':

z1=> \d one
Table "public.one"
Column | Type | Modifiers
--------+------+-----------
a | text |
Indexes:
"fts_a" gin (to_tsvector('simple'::regconfig, a))

z1=> \d two
Table "public.two"
Column | Type | Modifiers
--------+------+-----------
b | text |
Indexes:
"fts_b" gin (to_tsvector('simple'::regconfig, b))

Table 'one' has 51 rows:
z1=> select * from one;
a
--------------
Two
Three
Four
Five
****<snip>****
Forty nine
Fifty
Fifty one
Fifty two
(51 rows)

Table 'two' has 5001 rows:
z1=> select * from two;

b
--------------------------------------------
Fifty three
Fifty four
Fifty five
Fifty six
****<snip>****
Five thousand fifty
Five thousand fifty one
Five thousand fifty two
Five thousand fifty three
(5001 rows)

(At the bottom of this email I have copied the commands
I used to create the database.)

Now I run a full-text query on table 'one':
z1=> select count(*) from one where to_tsvector('simple', a) @@
to_tsquery('simple', '!Four');
count
-------
47
(1 row)

Running the same query under 'explain analyze' shows that the index is not being
used, but a sequential scan is being done:
z1=> explain analyze select a from one where to_tsvector('simple', a)
@@ to_tsquery('simple', '!Four');
QUERY PLAN
-----------------------------------------------------------------------------------------------
Seq Scan on one (cost=0.00..1.77 rows=1 width=32) (actual
time=0.019..0.309 rows=47 loops=1)
Filter: (to_tsvector('simple'::regconfig, a) @@ '!''four'''::tsquery)
Total runtime: 0.341 ms
(3 rows)

That all works fine.

But all is not fine when I do the same thing on the longer table 'two':
z1=> \set VERBOSITY verbose
z1=> select count(*) from two where to_tsvector('simple', b) @@
to_tsquery('simple', '!Four');
ERROR: 0A000: query requires full scan, which is not supported by GIN indexes
LOCATION: gin_extract_tsquery, tsginidx.c:74

I understand that this query does require a full scan, and I understand that
GIN indexes don't support a full scan, but why couldn't the planner fall back
to a sequential scan in this case? Of course it's slower, but I would prefer a
slower answer than failure with an error and no answer at all. I can simulate
this solution by doing the following, which forces a sequential scan.
z1=> set enable_bitmapscan to off;
SET
z1=> set enable_indexscan to off;
SET
z1=> select count(*) from two where to_tsvector('simple', b) @@
to_tsquery('simple', '!Four');
count
-------
3277
(1 row)

z1=> explain analyze select count(*) from two where
to_tsvector('simple', b) @@ to_tsquery('simple', '!Four');
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Aggregate (cost=114.03..114.04 rows=1 width=0) (actual
time=91.171..91.171 rows=1 loops=1)
-> Seq Scan on two (cost=0.00..114.02 rows=5 width=0) (actual
time=0.028..89.598 rows=3277 loops=1)
Filter: (to_tsvector('simple'::regconfig, b) @@ '!''four'''::tsquery)
Total runtime: 91.208 ms
(4 rows)

Any thoughts? Is this something that could be fixed in a future version of
PostgreSQL? Or is the current behavior intentionally the way it is for
some reason I haven't thought of yet?

Thanks a lot for your help,
James Dietrich

P.S. Here is the procedure I used create the test database.
I can also provide the output of pg_dump upon request.

jdietrch(at)io:~/z$ psql template1 -U stariadmin -W
Password for user stariadmin:
Welcome to psql 8.3.3, the PostgreSQL interactive terminal.

Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit

template1=> create database z1;
CREATE DATABASE
template1=> \q
jdietrch(at)io:~/z$ psql z1 -U stariadmin -W -f a_commands
Password for user stariadmin:
CREATE TABLE
CREATE INDEX
CREATE TABLE
CREATE INDEX
CREATE SEQUENCE
CREATE LANGUAGE
CREATE FUNCTION
nextval
---------
1
(1 row)

CREATE FUNCTION
CREATE FUNCTION
fill_one
----------
done
(1 row)

fill_two
----------
done
(1 row)

jdietrch(at)io:~/z$ cat a_commands
create table one(a text);
create index fts_a on one using gin(to_tsvector('simple', a));
create table two(b text);
create index fts_b on two using gin(to_tsvector('simple', b));
create sequence sequ;
create language plpgsql;
create or replace function num_word(out m_word text) as $$
declare
n integer;
o money;
begin
select into o nextval('sequ')::text::money;
select into m_word trim(substring(cash_words(o), 1,
position('dollars' in cash_words(o))-1));
end;
$$ language plpgsql;

select nextval('sequ');

create or replace function fill_one(out f_o text) as $$
begin
for i in 0..50 loop
insert into one values(num_word());
end loop;
f_o = 'done';
end;
$$ language plpgsql;

create or replace function fill_two(out f_tw text) as $$
begin
for i in 0..5000 loop
insert into two values(num_word());
end loop;
f_tw = 'done';
end;
$$ language plpgsql;

select fill_one();
select fill_two();
jdietrch(at)io:~/z$

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2008-07-15 23:23:20 Re: BUG #4307: INSERT fails with primary key contraint
Previous Message Oskars Ozols 2008-07-15 19:24:57 BUG #4307: INSERT fails with primary key contraint