Re: Bitmap index thoughts

From: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bitmap index thoughts
Date: 2006-12-27 02:14:47
Message-ID: Pine.LNX.4.58.0612271300520.21742@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hey Heikki,

On Tue, 26 Dec 2006, Heikki Linnakangas wrote:

> I've been skimming through the bitmap index patch...
>
> A scan needs to access at least five pages:
>
> 1. B-tree index (root+others, depending on depth)
> 2. The auxiliary heap page
> 3. bitmap index meta page
> 4. LOV page
> 5. bitmap page
>
> That seems like a lot of indirection. A high startup cost is probably ok

You're right, this is excessive and it was something I'd hoped to have
ripped out, but...

> for typical bitmap index use cases and most of the needed pages should
> stay in memory, but could we simplify this? Why do we need the auxiliary
> heap, couldn't we just store the blk+offset of the LOV item directly in
> the b-tree index item?

The problem is, the b-tree code is very much tied to the heap. I don't
want to modify the b-tree code to make bitmap indexes work (better).
What's really tempting is to just manage our own balanced tree within the
bitmap index file(s) itself. It would start from the metapage and simply
spill to other 'special' index pages when necessary. The problem is, we do
not have b-tree code generic enough that it would allow us to do this
trivially -- consider concurrency and WAL in particular, which we
currently get for free. I guess this is why I've been ignoring this issue
:-).

> And instead of having separate LOV pages that store a number of LOV
> items, how about storing each LOV item on a page of it's own, and using
> the rest of the page to store the last chunk of the bitmap. That would
> eliminate one page access, but more importantly, maybe we could then get
> rid of all the bm_last_* attributes in BMLOVItemData that complicate the
> patch quite a bit, while preserving the performance.

That's an interesting approach. We would still need a concept of
last_word, at the very least, and probably last_comp_word for convenience.
Your idea doesn't require any extra space, either, which is good.
Something I've been working through is the idea of a 'bitmap data
segment'. Currently, we store the HRL compressed bitmap data to the extent
of the page. That is, sizeof(BMBitmap) is as close to BLCKSZ as we can
make it. The problem is that we may have some bitmaps where a few values
occur only a small number of times and are well clustered at the beginning
of the heap. In that circumstance, we use a whole page to store a small
number of words and the free space cannot be used by any other vector.
Now, say a segment was some fraction the size of BLCKSZ, we use less space
for those vectors with few tuples in the heap. Just an idea...

Thanks,

Gavin

PS: Another versio of the patch shall be forthcoming shortly. I've been
working on compressing the data in memory during CREATE INDEX instead of
just managing arrays of TIDs in memory as we did previously. The array of
TIDs works great for well clustered data but it stinks for poorly
clustered data as we approach maintenance_word_mem and have to swap a lot.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2006-12-27 03:24:39 Re: Patch(es) to expose n_live_tuples and
Previous Message Euler Taveira de Oliveira 2006-12-27 01:56:37 xlog directory at initdb time