Re: Bitmap index thoughts

From: "Jie Zhang" <jzhang(at)greenplum(dot)com>
To: "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bitmap index thoughts
Date: 2006-12-27 09:45:58
Message-ID: C1B780D6.C771%jzhang@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


>> And instead of having separate LOV pages that store a number of LOV
>> items, how about storing each LOV item on a page of it's own, and using
>> the rest of the page to store the last chunk of the bitmap. That would
>> eliminate one page access, but more importantly, maybe we could then get
>> rid of all the bm_last_* attributes in BMLOVItemData that complicate the
>> patch quite a bit, while preserving the performance.
>
> That's an interesting approach. We would still need a concept of
> last_word, at the very least, and probably last_comp_word for convenience.
> Your idea doesn't require any extra space, either, which is good.
> Something I've been working through is the idea of a 'bitmap data
> segment'. Currently, we store the HRL compressed bitmap data to the extent
> of the page. That is, sizeof(BMBitmap) is as close to BLCKSZ as we can
> make it. The problem is that we may have some bitmaps where a few values
> occur only a small number of times and are well clustered at the beginning
> of the heap. In that circumstance, we use a whole page to store a small
> number of words and the free space cannot be used by any other vector.
> Now, say a segment was some fraction the size of BLCKSZ, we use less space
> for those vectors with few tuples in the heap. Just an idea...

The "bitmap data segment" sounds good in terms of space. The problem is that
one bitmap is likely to occupy more pages than before, which may hurt the
query performance. I have been thinking along the lines of increasing the
number of last bitmap words stored in each LOV item, but not to occupy one
page. This may prevent some cases Gavin indicated here, but not all.

Thanks,
Jie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dhanaraj M 2006-12-27 10:02:51 Allow the identifier length to be increased via a configure option
Previous Message Benny Amorsen 2006-12-27 08:39:22 Re: effective_cache_size vs units