Re: Bitmap index thoughts

From: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bitmap index thoughts
Date: 2006-12-27 11:10:49
Message-ID: Pine.LNX.4.58.0612272158450.23933@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 27 Dec 2006, Heikki Linnakangas wrote:

> Gavin Sherry wrote:
> > On Tue, 26 Dec 2006, Heikki Linnakangas wrote:
> >> for typical bitmap index use cases and most of the needed pages should
> >> stay in memory, but could we simplify this? Why do we need the auxiliary
> >> heap, couldn't we just store the blk+offset of the LOV item directly in
> >> the b-tree index item?
> >
> > The problem is, the b-tree code is very much tied to the heap. I don't
> > want to modify the b-tree code to make bitmap indexes work (better).
> > What's really tempting is to just manage our own balanced tree within the
> > bitmap index file(s) itself. It would start from the metapage and simply
> > spill to other 'special' index pages when necessary. The problem is, we do
> > not have b-tree code generic enough that it would allow us to do this
> > trivially -- consider concurrency and WAL in particular, which we
> > currently get for free. I guess this is why I've been ignoring this issue
> > :-).
>
> Maybe we could reuse the code in ginbtree.c. Looks like Teodor & Oleg
> had the same problem :).
>
> Modifying the nbtree code doesn't seem that difficult either. AFAICS,
> the only places where the heap is from within nbtree code is in index
> building and uniqueness checks.

I'll take a look and think about it.

>
> >> And instead of having separate LOV pages that store a number of LOV
> >> items, how about storing each LOV item on a page of it's own, and using
> >> the rest of the page to store the last chunk of the bitmap. That would
> >> eliminate one page access, but more importantly, maybe we could then get
> >> rid of all the bm_last_* attributes in BMLOVItemData that complicate the
> >> patch quite a bit, while preserving the performance.
> >
> > That's an interesting approach. We would still need a concept of
> > last_word, at the very least, and probably last_comp_word for convenience.
>
> Why?

The two last words of the bitmap vector, stored in the LOV item, provide a
few things: they buffer the addition of new matches in the vector and they
ease the calculation of the current position of the end of the vector.
Jie, do you want to add more?

I think we could do this differently by calculating everything based on
the other data stored in the lov item and data page (last_tid_location and
num_hrl_words_used, from memory). But, I'm not sure. Jie?

Thanks,

Gavin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gavin Sherry 2006-12-27 11:16:54 Re: Bitmap index thoughts
Previous Message Heikki Linnakangas 2006-12-27 10:58:55 Re: Bitmap index thoughts