Re: On-disk bitmap index patch

From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, Jie Zhang <jzhang(at)greenplum(dot)com>, pgsql-hackers(at)postgresql(dot)org, Luke Lonergan <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-26 01:43:39
Message-ID: 44C6C8CB.9030802@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:

>
> And your point is? Assuming a reasonable datatype like int4, a btree
> index on this table would occupy say 20GB (16 bytes per entry plus
> fillfactor overhead). The bitmap version would require 10,000 bitmaps
> of 1G bits apiece, or 1250GB (not even counting overhead). You need
> some wildly optimistic assumptions about the compressibility of the
> bitmaps to get even within hailing distance of the btree, let alone
> fit in RAM. A realistic assumption for the numbers you mention is
> that the bitmaps have 1-bits about 100 bits apart, which means you
> could get maybe 3-to-1 compression using the runlength scheme that's
> in there ... leaving the bitmap a factor of 20 bigger than the btree.
>
> AFAICS "low cardinality" has to mean just that, a few dozen distinct
> values at most, for this scheme to have any hope.
>

I did a little testing of this when Jie first submitted the patch - for
a basically Zipfian distribution of int4's the bitmap is still clearly
smaller even at 1000 distinct values (see below). It is twice as big at
10000, so the crossover point is somewhere in the 1000-10000 range (for
this test - however the results seem to be reasonably typical).

In DSS distinct values < 1000 is very common (days of week, months of
year, lineitems of order, sex of animal...) so the applicability of this
range is perhaps larger than it seems at first sight.

Cheers

Mark
------------------------------------------------------------------------

bitmap=# \d bitmaptest
Table "public.bitmaptest"
Column | Type | Modifiers
--------+---------+-----------
id | integer | not null
val0 | integer |
val1 | integer |
val2 | integer |
val3 | integer |
fil | text |

bitmap=# select count(distinct val0),
count(distinct val1),
count(distinct val2),
count(distinct val3)
from bitmaptest;
count | count | count | count
-------+-------+-------+-------
10 | 100 | 1000 | 10000

bitmap=# \i relsizes.sql (BTREE)
relname | relpages
-----------------+----------
bitmaptest | 93458
bitmaptest_val0 | 21899
bitmaptest_val1 | 21899
bitmaptest_val2 | 21899
bitmaptest_val3 | 21899

bitmap=# \i relsizes.sql (BITMAP)
relname | relpages
-----------------+----------
bitmaptest | 93458
bitmaptest_val0 | 1286
bitmaptest_val1 | 2313
bitmaptest_val2 | 5726
bitmaptest_val3 | 41292

For the interested, the data generator, schema, index files are here:
http://homepages.paradise.net.nz/markir/download/bizgres/bitmaptest.tar.gz

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2006-07-26 02:06:34 Re: [HACKERS] Resurrecting per-page cleaner for btree
Previous Message Bruce Momjian 2006-07-26 01:38:21 Re: [HACKERS] Resurrecting per-page cleaner for btree