Re: bitmap heap scan recheck on full text search with gin index

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Hal Roberts <hroberts(at)cyber(dot)law(dot)harvard(dot)edu>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: bitmap heap scan recheck on full text search with gin index
Date: 2009-02-06 04:22:17
Message-ID: Pine.LNX.4.64.0902060720070.4158@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hal,

just create separate column with tsvector and create index on it.

Oleg
On Thu, 5 Feb 2009, Hal Roberts wrote:

> Hi All,
>
> I'm getting poor performance on full text searches that return lots of
> entries from a table with about 7 million rows. I think the cause is
> rechecking the text match on all of the returned rows, even though I'm using
> a @@ query on a gin index, which the docs say should not require a recheck.
>
> Here's the table:
>
> ****
> mediacloud=> \d download_texts;
> Table "public.download_texts"
> Column | Type | Modifiers
> -------------------+---------+----------------------------------------------------------------------------
> download_texts_id | integer | not null default
> nextval('download_texts_download_texts_id_seq'::regclass)
> downloads_id | integer | not null
> download_text | text | not null
> Indexes:
> "download_texts_pkey" PRIMARY KEY, btree (download_texts_id)
> "download_texts_downloads_id_index" UNIQUE, btree (downloads_id)
> "download_texts_textsearch_idx" gin (to_tsvector('english'::regconfig,
> download_text)), tablespace "large_table_space"
> Foreign-key constraints:
> "download_texts_downloads_id_fkey" FOREIGN KEY (downloads_id) REFERENCES
> downloads(downloads_id)
> Tablespace: "large_table_space"
> ****
>
> And here's the query:
>
> ****
> mediacloud=> explain analyze select count(dt.download_texts_id) from
> download_texts dt where to_tsvector('english', download_text) @@
> to_tsquery('english', 'stimulus');
> QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=26161.16..26161.17 rows=1 width=4) (actual
> time=153640.083..153640.083 rows=1 loops=1)
> -> Bitmap Heap Scan on download_texts dt (cost=3937.41..26146.11 rows=6018
> width=4) (actual time=1957.074..153529.351 rows=72225 loops=1)
> Recheck Cond: (to_tsvector('english'::regconfig, download_text) @@
> '''stimulus'''::tsquery)
> -> Bitmap Index Scan on download_texts_textsearch_idx
> (cost=0.00..3935.90 rows=6018 width=0) (actual time=1048.556..1048.556
> rows=72225 loops=1)
> Index Cond: (to_tsvector('english'::regconfig, download_text) @@
> '''stimulus'''::tsquery)
> Total runtime: 153642.249 ms
> ****
>
> Nearly all of the time is being spent in the bitmap heap scan, I suspect
> because of the work of rereading and rechecking the text of all the matched
> entries. Is this indeed what's going on here? Is there any way to make
> postgres not do that recheck?
>
>
> Thanks!
>
> -hal
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Wong 2009-02-06 05:04:46 Re: dbt-2 tuning results with postgresql-8.3.5
Previous Message Hal Roberts 2009-02-05 23:13:36 bitmap heap scan recheck on full text search with gin index