text search: tablescan cost for a tsvector

From: "Marc Mamin" <M(dot)Mamin(at)intershop(dot)de>
To: <pgsql-performance(at)postgresql(dot)org>
Cc: <oleg(at)sai(dot)msu(dot)su>
Subject: text search: tablescan cost for a tsvector
Date: 2012-02-06 11:05:28
Message-ID: C4DAC901169B624F933534A26ED7DF310861B363@JENMAIL01.ad.intershop.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello,

I have quite systematically better performance with the text search when
I disable the statistics collection for the tsvector column.
So I wonder if such statistics ever make sense.

Here a testcase:

The table contains 200'000 tsvector, whereas the lexeme 'fooblablabla'
exists in all tsvector:
Without statistics, the planner decide as expected for the gin index.
After analyze, it switch to a table scan which is also expected, but the
query is 3 times slower.

My first thought was that the Bitmap Heap Scan was really fast as the
searched term is always at the first position.
So I repeated the test with an additional search term at the last
position, but without significant change:

(result from the 6. test below)

without analyze: http://explain.depesz.com/s/6At
with analyze: http://explain.depesz.com/s/r3B

best regards,

Marc Mamin

Here all my results, always one of the fastest from a few runs.

CREATE TABLE tsv_test
(
id bigserial NOT NULL,
v tsvector
);

<The code to fill the table with test data can be found below>

The test query:

explain analyze
select id from tsv_test where v @@ 'lexeme3179'::tsquery
UNION ALL
select id from tsv_test where v @@ 'lexeme5'::tsquery
UNION ALL
select id from tsv_test where v @@ 'fooblablabla'::tsquery

The results

A) on first lexeme

1) without indexes without analyze:
http://explain.depesz.com/s/bOv

2) alter table tsv_test add constraint tsv_test_pk primary key(id);
http://explain.depesz.com/s/9QQ (same as previous);

3) create index tsv_gin on tsv_test using gin(v);
http://explain.depesz.com/s/r4M <= fastest

4) ANALYZE tsv_test (id);
http://explain.depesz.com/s/MyC (same as previous);

5) ANALYZE tsv_test;
http://explain.depesz.com/s/qu3S

B) on lastlexeme

6) create table tsv_test2 as select id,
v||'zzthisisalongerlexemethisisalongerlexeme'::tsvector
from tsv_test;

explain analyze
select id from tsv_test2 where v @@
'zzthisisalongerlexemethisisalongerlexeme'::tsquery

http://explain.depesz.com/s/6At

ANALYZE tsv_test2;

http://explain.depesz.com/s/r3B

test data:

insert into tsv_test (v)
select
cast('fooblablabla' ||
' lexeme'||s%2|| ' lexeme'||s%3|| ' lexeme'||s%4||
' lexeme'||s%4|| ' lexeme'||s%5|| ' lexeme'||s%6||
' lexeme'||s%7|| ' lexeme'||s%8|| ' lexeme'||s%9||
' lexeme'||s%10 || ' lexeme2'||s%11 || ' lexeme3'||s%12 ||
' lexeme'||s%11 || ' lexeme2'||s%12 || ' lexeme3'||s%22 ||
' lexeme'||s%12 || ' lexeme2'||s%13 || ' lexeme3'||s%32 ||
' lexeme'||s%13 || ' lexeme2'||s%14 || ' lexeme3'||s%42 ||
' lexeme'||s%14 || ' lexeme2'||s%15 || ' lexeme3'||s%52 ||
' lexeme'||s%15 || ' lexeme2'||s%16 || ' lexeme3'||s%62 ||
' lexeme'||s%16 || ' lexeme2'||s%17 || ' lexeme3'||s%72 ||
' lexeme'||s%17 || ' lexeme2'||s%18 || ' lexeme3'||s%82 ||
' lexeme'||s%18 || ' lexeme2'||s%19 || ' lexeme3'||s%92 ||
' lexeme'||s%19 || ' lexeme2'||s%10 || ' lexeme3'||s%15 ||
' lexeme'||s%12 || ' lexeme2'||s%71 || ' lexeme3'||s%16 ||
' lexeme'||s%20 || ' lexeme2'||s%81 || ' lexeme3'||s%17 ||
' lexeme'||s%35 || ' lexeme2'||s%91 || ' lexeme3'||s%18 ||
' lexeme'||s%100 || ' lexeme2'||s%110 || ' lexeme3'||s%120 ||
' lexeme'||s%110 || ' lexeme2'||s%120 || ' lexeme3'||s%220 ||
' lexeme'||s%120 || ' lexeme2'||s%130 || ' lexeme3'||s%320 ||
' lexeme'||s%130 || ' lexeme2'||s%140 || ' lexeme3'||s%420 ||
' lexeme'||s%140 || ' lexeme2'||s%150 || ' lexeme3'||s%520 ||
' lexeme'||s%150 || ' lexeme2'||s%160 || ' lexeme3'||s%620 ||
' lexeme'||s%160 || ' lexeme2'||s%170 || ' lexeme3'||s%720 ||
' lexeme'||s%170 || ' lexeme2'||s%180 || ' lexeme3'||s%820 ||
' lexeme'||s%180 || ' lexeme2'||s%190 || ' lexeme3'||s%920 ||
' lexeme'||s%190 || ' lexeme2'||s%100 || ' lexeme3'||s%150 ||
' lexeme'||s%120 || ' lexeme2'||s%710 || ' lexeme3'||s%160 ||
' lexeme'||s%200 || ' lexeme2'||s%810 || ' lexeme3'||s%170 ||
' lexeme'||s%350 || ' lexeme2'||s%910 || ' lexeme3'||s%180
as tsvector)
FROM generate_series(1,100000) s
UNION ALL
select
cast('fooblablabla' ||
' thisisalongerlexemethisisalongerlexeme'||s%2|| '
thisisalongerlexemethisisalongerlexeme'||s%3|| '
thisisalongerlexemethisisalongerlexeme'||s%4||
' thisisalongerlexemethisisalongerlexeme'||s%4|| '
thisisalongerlexemethisisalongerlexeme'||s%5|| '
thisisalongerlexemethisisalongerlexeme'||s%6||
' thisisalongerlexemethisisalongerlexeme'||s%7|| '
thisisalongerlexemethisisalongerlexeme'||s%8|| '
thisisalongerlexemethisisalongerlexeme'||s%9||
' thisisalongerlexemethisisalongerlexeme'||s%10 || '
thisisalongerlexemethisisalongerlexeme2'||s%11 || '
thisisalongerlexemethisisalongerlexeme3'||s%12 ||
' thisisalongerlexemethisisalongerlexeme'||s%11 || '
thisisalongerlexemethisisalongerlexeme2'||s%12 || '
thisisalongerlexemethisisalongerlexeme3'||s%22 ||
' thisisalongerlexemethisisalongerlexeme'||s%12 || '
thisisalongerlexemethisisalongerlexeme2'||s%13 || '
thisisalongerlexemethisisalongerlexeme3'||s%32 ||
' thisisalongerlexemethisisalongerlexeme'||s%13 || '
thisisalongerlexemethisisalongerlexeme2'||s%14 || '
thisisalongerlexemethisisalongerlexeme3'||s%42 ||
' thisisalongerlexemethisisalongerlexeme'||s%14 || '
thisisalongerlexemethisisalongerlexeme2'||s%15 || '
thisisalongerlexemethisisalongerlexeme3'||s%52 ||
' thisisalongerlexemethisisalongerlexeme'||s%15 || '
thisisalongerlexemethisisalongerlexeme2'||s%16 || '
thisisalongerlexemethisisalongerlexeme3'||s%62 ||
' thisisalongerlexemethisisalongerlexeme'||s%16 || '
thisisalongerlexemethisisalongerlexeme2'||s%17 || '
thisisalongerlexemethisisalongerlexeme3'||s%72 ||
' thisisalongerlexemethisisalongerlexeme'||s%17 || '
thisisalongerlexemethisisalongerlexeme2'||s%18 || '
thisisalongerlexemethisisalongerlexeme3'||s%82 ||
' thisisalongerlexemethisisalongerlexeme'||s%18 || '
thisisalongerlexemethisisalongerlexeme2'||s%19 || '
thisisalongerlexemethisisalongerlexeme3'||s%92 ||
' thisisalongerlexemethisisalongerlexeme'||s%19 || '
thisisalongerlexemethisisalongerlexeme2'||s%10 || '
thisisalongerlexemethisisalongerlexeme3'||s%15 ||
' thisisalongerlexemethisisalongerlexeme'||s%12 || '
thisisalongerlexemethisisalongerlexeme2'||s%71 || '
thisisalongerlexemethisisalongerlexeme3'||s%16 ||
' thisisalongerlexemethisisalongerlexeme'||s%20 || '
thisisalongerlexemethisisalongerlexeme2'||s%81 || '
thisisalongerlexemethisisalongerlexeme3'||s%17 ||
' thisisalongerlexemethisisalongerlexeme'||s%35 || '
thisisalongerlexemethisisalongerlexeme2'||s%91 || '
thisisalongerlexemethisisalongerlexeme3'||s%18 ||
' thisisalongerlexemethisisalongerlexeme'||s%100 || '
thisisalongerlexemethisisalongerlexeme2'||s%110 || '
thisisalongerlexemethisisalongerlexeme3'||s%120 ||
' thisisalongerlexemethisisalongerlexeme'||s%110 || '
thisisalongerlexemethisisalongerlexeme2'||s%120 || '
thisisalongerlexemethisisalongerlexeme3'||s%220 ||
' thisisalongerlexemethisisalongerlexeme'||s%120 || '
thisisalongerlexemethisisalongerlexeme2'||s%130 || '
thisisalongerlexemethisisalongerlexeme3'||s%320 ||
' thisisalongerlexemethisisalongerlexeme'||s%130 || '
thisisalongerlexemethisisalongerlexeme2'||s%140 || '
thisisalongerlexemethisisalongerlexeme3'||s%420 ||
' thisisalongerlexemethisisalongerlexeme'||s%140 || '
thisisalongerlexemethisisalongerlexeme2'||s%150 || '
thisisalongerlexemethisisalongerlexeme3'||s%520 ||
' thisisalongerlexemethisisalongerlexeme'||s%150 || '
thisisalongerlexemethisisalongerlexeme2'||s%160 || '
thisisalongerlexemethisisalongerlexeme3'||s%620 ||
' thisisalongerlexemethisisalongerlexeme'||s%160 || '
thisisalongerlexemethisisalongerlexeme2'||s%170 || '
thisisalongerlexemethisisalongerlexeme3'||s%720 ||
' thisisalongerlexemethisisalongerlexeme'||s%170 || '
thisisalongerlexemethisisalongerlexeme2'||s%180 || '
thisisalongerlexemethisisalongerlexeme3'||s%820 ||
' thisisalongerlexemethisisalongerlexeme'||s%180 || '
thisisalongerlexemethisisalongerlexeme2'||s%190 || '
thisisalongerlexemethisisalongerlexeme3'||s%920 ||
' thisisalongerlexemethisisalongerlexeme'||s%190 || '
thisisalongerlexemethisisalongerlexeme2'||s%100 || '
thisisalongerlexemethisisalongerlexeme3'||s%150 ||
' thisisalongerlexemethisisalongerlexeme'||s%120 || '
thisisalongerlexemethisisalongerlexeme2'||s%710 || '
thisisalongerlexemethisisalongerlexeme3'||s%160 ||
' thisisalongerlexemethisisalongerlexeme'||s%200 || '
thisisalongerlexemethisisalongerlexeme2'||s%810 || '
thisisalongerlexemethisisalongerlexeme3'||s%170 ||
' thisisalongerlexemethisisalongerlexeme'||s%350 || '
thisisalongerlexemethisisalongerlexeme2'||s%910 || '
thisisalongerlexemethisisalongerlexeme3'||s%180
as tsvector)
FROM generate_series(1,100000) s

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ofer Israeli 2012-02-07 10:18:35 Inserts or Updates
Previous Message Saurabh 2012-02-05 17:29:22 Re: How to improve insert speed with index on text column