Re: plans for bitmap indexes?

From: Hannu Krosing <hannu(at)tm(dot)ee>
To: Hannu Krosing <hannu(at)inversion(dot)ee>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: plans for bitmap indexes?
Date: 2004-10-26 21:18:04
Message-ID: 1098825483.2643.28.camel@fuji.krosing.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On T, 2004-10-26 at 23:53, Hannu Krosing wrote:
> On T, 2004-10-26 at 18:42, Greg Stark wrote:
> > Hannu Krosing <hannu(at)tm(dot)ee> writes:
> >
> > > I repeat here my earlier proposal of making the bitmap indexes
> > > page-level and clustering data automatically on AND of all defined
> > > bitmap indexes.
> >
> > The problem with page-level bitmaps is that they could be much less effective.
> > Consider a query like 'WHERE foo = ? AND bar = ? AND baz = ?" where each of
> > those matches about 1% of your tuples. If you have 100 tuples per page then
> > each of those bitmaps will find a tuple in about half the pages. So the
> > resulting AND will find about 1/8th of the pages as candidates. In reality the
> > number of pages it should have to fetch should be more like 1 in a million.

Another way to solve the unequal number of tuples per page problem was
to have a fixed length bitmap with rollower (mod length) for each page.

So your 100 tuples per page on average table should get a 32 (or 64)
bits per page bitmap where bits 1, 33, 65 and 97 would all be in the
same position (for 32 bits), but one could still do fast ANDS and ORS
with high degree of accuracy.

I guess the per-page clustering idea described in my previous mail can
even be extended inside the pages (i.e. cluster on same bits in
2/4/8/16/32bit page bitmap) if simple per/page bitmaps would waste too
much space (many different values, few actual rows - i.e. not a good
candidate for real bitmap indexes ;-p )

> > The other problem is that for persist on-disk indexes they require more work
> > to update. You would have to recheck every other tuple in the page to
> > recalculate the bit value instead of just being able to flip one bit.
>
> Read again ;)
>
> the per-page clustering would make sure that all the tuples would be on
> 1 (or on a few) pages.
>
> and what comes to updating the index, you have to do it only once per
> 100 pages ;)

This kind of clustering index works best when created on an empty table,
so all tuples can be inserted on their rightful pages.

If this kind of BM index is created on a table with some data, we need
an additional bitmap for "gray" pages - that is pages containing tuples
matching several combinations of index bits.

The way to sharpen a table with "gray" pages would be either a CLUSTER
command or VACUUM (which could check for same-bit-combination-ness.

At least an empty page would be initially (or after becoming empty
during vacuum) marked non-gray and it should also never become gray
unless a new bitmap index is added.

-----------------
Hannu

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andre Maasikas 2004-10-26 21:58:49 Re: plans for bitmap indexes?
Previous Message Hannu Krosing 2004-10-26 20:53:44 Re: plans for bitmap indexes?