Re: On-disk bitmap index patch

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, Jie Zhang <jzhang(at)greenplum(dot)com>, pgsql-hackers(at)postgresql(dot)org, Luke Lonergan <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-25 20:46:29
Message-ID: 24309.1153860389@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> One particular compelling situation for on-disk bitmaps is for terabyte
> tables where a btree index would not fit into memory. Index
> performance for an index which is 10x or more the size of RAM really
> sucks ... I can come up with some test results if you doubt that.

Sure...

> Also note that "low cardinality" is relative. For a 1 billion row
> table, a column with 10,000 values is "low-cardinality", having around
> 100,000 rows per value ... but that's still 0.01% of the table per
> value, making index use still applicable.

And your point is? Assuming a reasonable datatype like int4, a btree
index on this table would occupy say 20GB (16 bytes per entry plus
fillfactor overhead). The bitmap version would require 10,000 bitmaps
of 1G bits apiece, or 1250GB (not even counting overhead). You need
some wildly optimistic assumptions about the compressibility of the
bitmaps to get even within hailing distance of the btree, let alone
fit in RAM. A realistic assumption for the numbers you mention is
that the bitmaps have 1-bits about 100 bits apart, which means you
could get maybe 3-to-1 compression using the runlength scheme that's
in there ... leaving the bitmap a factor of 20 bigger than the btree.

AFAICS "low cardinality" has to mean just that, a few dozen distinct
values at most, for this scheme to have any hope.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2006-07-25 21:09:58 Re: effective_cache_size is a real?
Previous Message Luke Lonergan 2006-07-25 20:39:11 Re: On-disk bitmap index patch