Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Jesper Krogh <jesper(at)krogh(dot)cc>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
Date: 2009-10-27 05:33:54
Message-ID: 1256621634.1709.41.camel@wallace.localnet
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, 2009-10-27 at 06:08 +0100, Jesper Krogh wrote:

> > You should probably re-generate your random value for each call rather
> > than store it. Currently, every document with commonterm20 is guaranteed
> > to also have commonterm40, commonterm60, etc, which probably isn't very
> > realistic, and also makes doc size correlate with word rarity.
>
> I had that in the first version, but I wanted to have the gaurantee that
> a commonterm60 was indeed a subset of commonterm80, so that why its
> sturctured like that. I know its not realistic, but it gives measureable
> results since I know my queries will hit the same tuples.
>
> I fail to see how this should have any direct effect on query time?

Probably not, in truth, but with the statistics-based planner I'm
occasionally surprised by what can happen.

>
> > In this sort of test it's often a good idea to TRUNCATE the table before
> > populating it with a newly generated data set. That helps avoid any
> > residual effects from table bloat etc from lingering between test runs.
>
> As you could see in the scripts, the table is dropped just before its
> recreated and filled with data.
>
> Did you try to re-run the test?

No, I didn't. I thought it worth checking if bloat might be the result
first, though I should've read the scripts to confirm you weren't
already handling that possibility.

Anyway, I've done a run to generate your data set and run a test. After
executing the test statement twice (once with and once without
enable_seqscan) to make sure all data is in cache and not being read
from disk, when I run the tests here are my results:

test=> set enable_seqscan=on;
SET
test=> explain analyze select id from ftstest where body_fts @@
to_tsquery('commonterm80');
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on ftstest (cost=36.96..227.10 rows=50 width=4) (actual time=15.830..134.194 rows=40061 loops=1)
Recheck Cond: (body_fts @@ to_tsquery('commonterm80'::text))
-> Bitmap Index Scan on ftstest_gin_idx (cost=0.00..36.95 rows=50 width=0) (actual time=11.905..11.905 rows=40061 loops=1)
Index Cond: (body_fts @@ to_tsquery('commonterm80'::text))
Total runtime: 148.477 ms
(5 rows)

test=> set enable_seqscan=off;
SET
test=> explain analyze select id from ftstest where body_fts @@
to_tsquery('commonterm80');
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on ftstest (cost=36.96..227.10 rows=50 width=4) (actual time=15.427..134.156 rows=40061 loops=1)
Recheck Cond: (body_fts @@ to_tsquery('commonterm80'::text))
-> Bitmap Index Scan on ftstest_gin_idx (cost=0.00..36.95 rows=50 width=0) (actual time=11.739..11.739 rows=40061 loops=1)
Index Cond: (body_fts @@ to_tsquery('commonterm80'::text))
Total runtime: 148.583 ms
(5 rows)

Any chance your disk cache was cold on the first test run, so Pg was
having to read the table from disk during the seqscan, and could just
use shared_buffers when you repeated the test for the index scan?

--
Craig Ringer

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jesper Krogh 2009-10-27 05:44:35 Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
Previous Message Jesper Krogh 2009-10-27 05:08:41 Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).