Skip site navigation (1) Skip section navigation (2)

full-text search doesn't fall back on sequential scan when it could

From: "James Dietrich" <jdietrch(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: full-text search doesn't fall back on sequential scan when it could
Date: 2008-07-15 20:51:30
Message-ID: 9dd1c6ea0807151351g52776137nc0fdfcab59e95b5@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-bugs
Hi,

I am running PostgreSQL 8.3.3 on Linux 2.6.18
jdietrch(at)io:~/z$ uname -a
Linux io.mills.lan 2.6.18-6-vserver-k7 #1 SMP Sat Jun 7 01:10:29 UTC
2008 i686 GNU/Linux

Short version of the problem:
When I run a full-text search that requires a full scan of the table,
an error is returned (GIN indexes don't support sequential scan)
instead of falling back on a sequential scan which would return
the results of the query.

Long version:
I have a database with two tables named 'one' and 'two':

z1=> \d one
    Table "public.one"
 Column | Type | Modifiers
--------+------+-----------
 a      | text |
Indexes:
    "fts_a" gin (to_tsvector('simple'::regconfig, a))

z1=> \d two
    Table "public.two"
 Column | Type | Modifiers
--------+------+-----------
 b      | text |
Indexes:
    "fts_b" gin (to_tsvector('simple'::regconfig, b))

Table 'one' has 51 rows:
z1=> select * from one;
      a
--------------
 Two
 Three
 Four
 Five
****<snip>****
 Forty nine
 Fifty
 Fifty one
 Fifty two
(51 rows)

Table 'two' has 5001 rows:
z1=> select * from two;

                     b
--------------------------------------------
 Fifty three
 Fifty four
 Fifty five
 Fifty six
****<snip>****
 Five thousand fifty
 Five thousand fifty one
 Five thousand fifty two
 Five thousand fifty three
(5001 rows)

(At the bottom of this email I have copied the commands
I used to create the database.)

Now I run a full-text query on table 'one':
z1=> select count(*) from one where to_tsvector('simple', a) @@
to_tsquery('simple', '!Four');
 count
-------
    47
(1 row)

Running the same query under 'explain analyze' shows that the index is not being
used, but a sequential scan is being done:
z1=> explain analyze select a from one where to_tsvector('simple', a)
@@ to_tsquery('simple', '!Four');
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Seq Scan on one  (cost=0.00..1.77 rows=1 width=32) (actual
time=0.019..0.309 rows=47 loops=1)
   Filter: (to_tsvector('simple'::regconfig, a) @@ '!''four'''::tsquery)
 Total runtime: 0.341 ms
(3 rows)

That all works fine.

But all is not fine when I do the same thing on the longer table 'two':
z1=> \set VERBOSITY verbose
z1=> select count(*) from two where to_tsvector('simple', b) @@
to_tsquery('simple', '!Four');
ERROR:  0A000: query requires full scan, which is not supported by GIN indexes
LOCATION:  gin_extract_tsquery, tsginidx.c:74

I understand that this query does require a full scan, and I understand that
GIN indexes don't support a full scan, but why couldn't the planner fall back
to a sequential scan in this case? Of course it's slower, but I would prefer a
slower answer than failure with an error and no answer at all. I can simulate
this solution by doing the following, which forces a sequential scan.
z1=> set enable_bitmapscan to off;
SET
z1=> set enable_indexscan to off;
SET
z1=> select count(*) from two where to_tsvector('simple', b) @@
to_tsquery('simple', '!Four');
 count
-------
  3277
(1 row)

z1=> explain analyze select count(*) from two where
to_tsvector('simple', b) @@ to_tsquery('simple', '!Four');
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Aggregate  (cost=114.03..114.04 rows=1 width=0) (actual
time=91.171..91.171 rows=1 loops=1)
   ->  Seq Scan on two  (cost=0.00..114.02 rows=5 width=0) (actual
time=0.028..89.598 rows=3277 loops=1)
         Filter: (to_tsvector('simple'::regconfig, b) @@ '!''four'''::tsquery)
 Total runtime: 91.208 ms
(4 rows)

Any thoughts? Is this something that could be fixed in a future version of
PostgreSQL? Or is the current behavior intentionally the way it is for
some reason I haven't thought of yet?

Thanks a lot for your help,
James Dietrich

P.S. Here is the procedure I used create the test database.
I can also provide the output of pg_dump upon request.

jdietrch(at)io:~/z$ psql template1 -U stariadmin -W
Password for user stariadmin:
Welcome to psql 8.3.3, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit

template1=> create database z1;
CREATE DATABASE
template1=> \q
jdietrch(at)io:~/z$ psql z1 -U stariadmin -W -f a_commands
Password for user stariadmin:
CREATE TABLE
CREATE INDEX
CREATE TABLE
CREATE INDEX
CREATE SEQUENCE
CREATE LANGUAGE
CREATE FUNCTION
 nextval
---------
       1
(1 row)

CREATE FUNCTION
CREATE FUNCTION
 fill_one
----------
 done
(1 row)

 fill_two
----------
 done
(1 row)

jdietrch(at)io:~/z$ cat a_commands
create table one(a text);
create index fts_a on one using gin(to_tsvector('simple', a));
create table two(b text);
create index fts_b on two using gin(to_tsvector('simple', b));
create sequence sequ;
create language plpgsql;
create or replace function num_word(out m_word text) as $$
  declare
    n integer;
    o money;
  begin
    select into o nextval('sequ')::text::money;
    select into m_word trim(substring(cash_words(o), 1,
position('dollars' in cash_words(o))-1));
  end;
$$ language plpgsql;

select nextval('sequ');

create or replace function fill_one(out f_o text) as $$
  begin
    for i in 0..50 loop
      insert into one values(num_word());
    end loop;
    f_o = 'done';
  end;
$$ language plpgsql;

create or replace function fill_two(out f_tw text) as $$
  begin
    for i in 0..5000 loop
      insert into two values(num_word());
    end loop;
    f_tw = 'done';
  end;
$$ language plpgsql;

select fill_one();
select fill_two();
jdietrch(at)io:~/z$

pgsql-bugs by date

Next:From: Tom LaneDate: 2008-07-15 23:23:20
Subject: Re: BUG #4307: INSERT fails with primary key contraint
Previous:From: Oskars OzolsDate: 2008-07-15 19:24:57
Subject: BUG #4307: INSERT fails with primary key contraint

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group