Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).

From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
Date: 2009-10-27 05:08:41
Message-ID: 4AE68059.7040601@krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Craig Ringer wrote:
> On Mon, 2009-10-26 at 21:02 +0100, Jesper Krogh wrote:
>
>> Test system.. average desktop, 1 SATA drive and 1.5GB memory with pg 8.4.1.
>>
>> The dataset consists of words randomized, but .. all records contains
>> "commonterm", around 80% contains commonterm80 and so on..
>>
>> my $rand = rand();
>> push @doc,"commonterm" if $commonpos == $j;
>> push @doc,"commonterm80" if $commonpos == $j && $rand < 0.8;
>
> You should probably re-generate your random value for each call rather
> than store it. Currently, every document with commonterm20 is guaranteed
> to also have commonterm40, commonterm60, etc, which probably isn't very
> realistic, and also makes doc size correlate with word rarity.

I had that in the first version, but I wanted to have the gaurantee that
a commonterm60 was indeed a subset of commonterm80, so that why its
sturctured like that. I know its not realistic, but it gives measureable
results since I know my queries will hit the same tuples.

I fail to see how this should have any direct effect on query time?

>> Given that the seq-scan have to visit 50K row to create the result and
>> the bitmap heap scan only have to visit 40K (but search the index) we
>> would expect the seq-scan to be at most 25% more expensive than the
>> bitmap-heap scan.. e.g. less than 300ms.
>
> I suspect table bloat. Try VACUUMing your table and trying again.

No bloat here:
ftstest=# VACUUM FULL VERBOSE ftstest;
INFO: vacuuming "public.ftstest"
INFO: "ftstest": found 0 removable, 50000 nonremovable row versions in
10000 pages
DETAIL: 0 dead row versions cannot be removed yet.
Nonremovable row versions range from 1352 to 1652 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 6859832 bytes.
0 pages are or will become empty, including 0 at the end of the table.
536 pages containing 456072 free bytes are potential move destinations.
CPU 0.03s/0.03u sec elapsed 0.06 sec.
INFO: index "ftstest_id_key" now contains 50000 row versions in 139 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.13 sec.
INFO: index "ftstest_gin_idx" now contains 50000 row versions in 35792
pages
DETAIL: 0 index pages have been deleted, 25022 are currently reusable.
CPU 0.46s/0.11u sec elapsed 11.16 sec.
INFO: "ftstest": moved 0 row versions, truncated 10000 to 10000 pages
DETAIL: CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO: vacuuming "pg_toast.pg_toast_908525"
INFO: "pg_toast_908525": found 0 removable, 100000 nonremovable row
versions in 16710 pages
DETAIL: 0 dead row versions cannot be removed yet.
Nonremovable row versions range from 270 to 2032 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 3695712 bytes.
0 pages are or will become empty, including 0 at the end of the table.
5063 pages containing 1918692 free bytes are potential move destinations.
CPU 0.38s/0.17u sec elapsed 2.64 sec.
INFO: index "pg_toast_908525_index" now contains 100000 row versions in
276 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.28 sec.
INFO: "pg_toast_908525": moved 0 row versions, truncated 16710 to 16710
pages
DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
ftstest=#

> In this sort of test it's often a good idea to TRUNCATE the table before
> populating it with a newly generated data set. That helps avoid any
> residual effects from table bloat etc from lingering between test runs.

As you could see in the scripts, the table is dropped just before its
recreated and filled with data.

Did you try to re-run the test?

Jesper
--
Jesper

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Craig Ringer 2009-10-27 05:33:54 Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
Previous Message Craig Ringer 2009-10-27 04:57:05 Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).