Skip site navigation (1) Skip section navigation (2)

bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).

From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: pgsql-performance(at)postgresql(dot)org
Subject: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
Date: 2009-10-26 20:02:57
Message-ID: 4AE60071.5030301@krogh.cc (view raw or flat)
Thread:
Lists: pgsql-performance
Hi.

I'm currently trying to figure out why the tsearch performance seems to
vary a lot between different queryplans. I have created a sample dataset
that sort of resembles the data I have to work on.

The script that builds the dataset is at:
http://krogh.cc/~jesper/build-test.pl
and http://krogh.cc/~jesper/words.txt is needed for it to run.

Test system.. average desktop, 1 SATA drive and 1.5GB memory with pg 8.4.1.

The dataset consists of words randomized, but .. all records contains
"commonterm", around 80% contains commonterm80 and so on..

	my $rand = rand();
	push @doc,"commonterm" if $commonpos == $j;
	push @doc,"commonterm80" if $commonpos == $j && $rand < 0.8;

Results are run multiple times after each other so they should be
reproducible:

ftstest=# explain analyze select id from ftstest where body_fts @@
to_tsquery('commonterm80');
                                                   QUERY PLAN

----------------------------------------------------------------------------------------------------------------
 Seq Scan on ftstest  (cost=0.00..10750.00 rows=40188 width=4) (actual
time=0.102..1792.215 rows=40082 loops=1)
   Filter: (body_fts @@ to_tsquery('commonterm80'::text))
 Total runtime: 1809.437 ms
(3 rows)

ftstest=# set enable_seqscan=off;
SET
ftstest=# explain analyze select id from ftstest where body_fts @@
to_tsquery('commonterm80');
                                                              QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on ftstest  (cost=115389.14..125991.96 rows=40188
width=4) (actual time=17.445..197.356 rows=40082 loops=1)
   Recheck Cond: (body_fts @@ to_tsquery('commonterm80'::text))
   ->  Bitmap Index Scan on ftstest_gin_idx  (cost=0.00..115379.09
rows=40188 width=0) (actual time=13.370..13.370 rows=40082 loops=1)
         Index Cond: (body_fts @@ to_tsquery('commonterm80'::text))
 Total runtime: 204.201 ms
(5 rows)

Given that the seq-scan have to visit 50K row to create the result and
the bitmap heap scan only have to visit 40K (but search the index) we
would expect the seq-scan to be at most 25% more expensive than the
bitmap-heap scan.. e.g. less than 300ms.

Jesper
-- 
Jesper

Responses

pgsql-performance by date

Next:From: Craig RingerDate: 2009-10-27 04:57:05
Subject: Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
Previous:From: WaldomiroDate: 2009-10-26 19:56:12
Subject: Re: query planning different in plpgsql?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group