Quick Links

Re: On-disk bitmap index patch

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Jie Zhang" <jzhang(at)greenplum(dot)com>
Cc:	"Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org, "Luke Lonergan" <LLonergan(at)greenplum(dot)com>
Subject:	Re: On-disk bitmap index patch
Date:	2006-07-27 06:50:35
Message-ID:	748.1153983035@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

"Jie Zhang" <jzhang(at)greenplum(dot)com> writes:
> On 7/26/06 10:14 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> ... A nonuniform distribution would probably mean that some
>> of the bitmaps compress better-than-expected and others worse. I have
>> no idea how to model that and guess what the overall result is ...

> The paper "Optimizing Bitmap Indices With Efficient Compression" by Kesheng
> Wu et al gave an approximate answer for this question. Assume that there are
> c distinct values. Let the i-th value has a probability of p_i, the number
> of rows r, and the word size w. then the total size of the compressed bitmap
> index is about (N/(w-1))(c- \sum(1-p_i)^(2w-2) - \sum(p_i)^(2w-2)), where in
> both \sum's, i is from 1 to c.

Hm, but that's still begging the question no? It's still assuming that
any one value is uniformly distributed. ISTM the cases that would break
my simplistic calculation involve clustering of particular values, such
that some areas of the bitmap are all-zero while other areas have lots
of ones.

regards, tom lane

In response to

Re: On-disk bitmap index patch at 2006-07-27 06:21:26 from Jie Zhang

Responses

Re: On-disk bitmap index patch at 2006-07-27 16:13:21 from Jie Zhang

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Glaesemann	2006-07-27 07:49:31	Re: GUC with units, details
Previous Message	Jie Zhang	2006-07-27 06:21:26	Re: On-disk bitmap index patch