Skip site navigation (1) Skip section navigation (2)

Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).

From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
Date: 2009-10-27 05:44:35
Message-ID: 4AE688C3.2080207@krogh.cc (view raw or flat)
Thread:
Lists: pgsql-performance
Craig Ringer wrote:
> On Tue, 2009-10-27 at 06:08 +0100, Jesper Krogh wrote:
> 
>>> You should probably re-generate your random value for each call rather
>>> than store it. Currently, every document with commonterm20 is guaranteed
>>> to also have commonterm40, commonterm60, etc, which probably isn't very
>>> realistic, and also makes doc size correlate with word rarity.
>> I had that in the first version, but I wanted to have the gaurantee that
>> a commonterm60 was indeed a subset of commonterm80, so that why its
>> sturctured like that. I know its not realistic, but it gives measureable
>> results since I know my queries will hit the same tuples.
>>
>> I fail to see how this should have any direct effect on query time?
> 
> Probably not, in truth, but with the statistics-based planner I'm
> occasionally surprised by what can happen.
> 
>>> In this sort of test it's often a good idea to TRUNCATE the table before
>>> populating it with a newly generated data set. That helps avoid any
>>> residual effects from table bloat etc from lingering between test runs.
>> As you could see in the scripts, the table is dropped just before its
>> recreated and filled with data.
>>
>> Did you try to re-run the test?
> 
> No, I didn't. I thought it worth checking if bloat might be the result
> first, though I should've read the scripts to confirm you weren't
> already handling that possibility.
> 
> Anyway, I've done a run to generate your data set and run a test. After
> executing the test statement twice (once with and once without
> enable_seqscan) to make sure all data is in cache and not being read
> from disk, when I run the tests here are my results:
> 
> 
> test=> set enable_seqscan=on;
> SET
> test=> explain analyze select id from ftstest where body_fts @@
> to_tsquery('commonterm80');

Here you should search for "commonterm" not "commonterm80", commonterm
will go into a seq-scan. You're not testing the same thing as I did.

> Any chance your disk cache was cold on the first test run, so Pg was
> having to read the table from disk during the seqscan, and could just
> use shared_buffers when you repeated the test for the index scan?

they were run repeatedly.

In response to

Responses

pgsql-performance by date

Next:From: Craig RingerDate: 2009-10-27 06:14:37
Subject: Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
Previous:From: Craig RingerDate: 2009-10-27 05:33:54
Subject: Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group