Re: On-disk bitmap index patch

From: "Jie Zhang" <jzhang(at)greenplum(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org, "Luke Lonergan" <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-27 16:13:21
Message-ID: C0EE3431.9116%jzhang@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/26/06 11:50 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> "Jie Zhang" <jzhang(at)greenplum(dot)com> writes:
>> On 7/26/06 10:14 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> ... A nonuniform distribution would probably mean that some
>>> of the bitmaps compress better-than-expected and others worse. I have
>>> no idea how to model that and guess what the overall result is ...
>
>> The paper "Optimizing Bitmap Indices With Efficient Compression" by Kesheng
>> Wu et al gave an approximate answer for this question. Assume that there are
>> c distinct values. Let the i-th value has a probability of p_i, the number
>> of rows r, and the word size w. then the total size of the compressed bitmap
>> index is about (N/(w-1))(c- \sum(1-p_i)^(2w-2) - \sum(p_i)^(2w-2)), where in
>> both \sum's, i is from 1 to c.
>
> Hm, but that's still begging the question no? It's still assuming that
> any one value is uniformly distributed. ISTM the cases that would break
> my simplistic calculation involve clustering of particular values, such
> that some areas of the bitmap are all-zero while other areas have lots
> of ones.

Yes, you are right -- each value is still uniformly distributed. But this
will be the worst case in terms of the size of a bitmap vector. As for how
to model the size of a bitmap vector for an non-uniformly distributed value,
that's a good question. I don't really know. But we do know the best case
and the worse case.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2006-07-27 16:15:12 Re: [HACKERS] [COMMITTERS] pgsql: /contrib/cube improvements:
Previous Message Jim Nasby 2006-07-27 16:08:52 Re: An appropriate place for UDF questions?