Quick Links

Re: On-disk bitmap index patch

From:	Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, Jie Zhang <jzhang(at)greenplum(dot)com>, pgsql-hackers(at)postgresql(dot)org, Luke Lonergan <LLonergan(at)greenplum(dot)com>
Subject:	Re: On-disk bitmap index patch
Date:	2006-07-27 02:00:18
Message-ID:	44C81E32.8090500@paradise.net.nz
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Tom Lane wrote:
>
>
> I'm surprised no one caught me making this bogus computation. I
> realized this morning it's wrong: if there are 10000 distinct values
> then on the average the 1-bits would be about 10000 bits apart, not 100.

Right - I didn't think 10000 was *that* bad, but was too sleepy to try
working it out :-).

>
>
> I don't believe the 100x numbers that have been
> bandied around in this discussion, but 10x is plenty enough to be
> interesting.
>

Yep - I have not managed to get 100x in any of my tests. However, I do
see some about half that for the TPCH scale 10 dataset:

The orders_o_orderpriority and orders_o_orderstatus bitmap indexes are
46 and 57 times smaller than their btree counterparts (hmm...might we
see even better compression for larger scale factors?).

An obvious deduction is that the TPCH dataset is much more amenable to
run compression than my synthetic Zipfian data was. The interesting
question is how well "real" datasets are run compressable, I suspect
"better than my Zipfian data" is a safe assumption!

Cheers

Mark

In response to

Re: On-disk bitmap index patch at 2006-07-26 14:26:01 from Tom Lane

Responses

Re: On-disk bitmap index patch at 2006-07-27 05:14:43 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Luke Lonergan	2006-07-27 03:55:38	Re: On-disk bitmap index patch
Previous Message	Qingqing Zhou	2006-07-27 01:28:12	Re: default lower case of identifier