Re: Bitmap index stuff

From: Gavin Sherry <swm(at)alcove(dot)com(dot)au>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bitmap index stuff
Date: 2007-02-26 23:07:43
Message-ID: Pine.LNX.4.58.0702270957200.27992@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 26 Feb 2007, Heikki Linnakangas wrote:

> Hi,
>
> How are you doing with the bitmap indexes?

I need to send of a patch fixing the last bug you pointed out. The code
needs a merge of HEAD.

>
> I've been trying to get my head around the patch a couple of times to
> add the vacuum support, but no matter how simple I try to keep it, I
> just always seem to get stuck.
>
> It looks like vacuum support would need:
>
> - something similar to read_words, let's call it iterate_set_bits, that
> returns each set bit from a bitmap vector, keeping the buffer locked
> over calls.
> - ability to clear the bit returned by iterate_set_bits. If normal index
> scans also used this, the same functions could be used to support the
> kill_prior_tuple thingie.

Okay.

>
> The above also needs to be able to recompress a page if it gets
> fragmented by repeated setting and clearing of bits.

Yes.

> I still feel that the data structures are unnecessarily complex. In
> particular, I'd like to get rid of the special-cased last_word and
> last_comp_word in the lov item. Perhaps we could instead embed a normal,
> but smaller, BMBitmapData structure in the lov item, and just add a
> length field to that?

I'm not sure that this really simplifies the code. I agree things could be
simpler though.

>
> You have a lot of code to support efficient building of a bitmap index.
> I know you've worked hard on that, but do we really need all that? How
> did the bitmap build work in the previous versions of the patch, and how
> much faster is the current approach?

I included details on a previous email, I thought. Basically, in cases
where the data is distributed as follows:

1 1 1 1 1 1 1 .... 2 2 2 2 2 2 2 .... 3 3 3 3 3 3 3 3 ...

We're very fast in both versions. If the data is distributed as:

1 2 3 4 5 6 .... 1 2 3 4 5 6

In the original version(s), we were terribly slow (in my test, 7 times
slower than btree). Considering the kind of data sets bitmap suits, this
made bitmap unusable. With the rewrite, we're much faster (in my test,
faster than btree).

The test case was: a table with 600M rows with 100,000 distinct keys to be
indexed.

> BTW: It occured to me that since we're piggybacking on b-tree's strategy
> numbers, comparison operators etc, conceivably we could also use any
> other indexam. For example, a bitmap GiST would be pretty funky. We'll
> probably leave that for future versions, but have you given that any
> thought?

True. I haven't given it any thought though. Interesting... I'd have to
think of some interesting data sets which would suit the capabilities
(operators) we have with GiST.

Thanks,

Gavin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-02-26 23:11:33 Re: Acclerating INSERT/UPDATE using UPS
Previous Message Joshua D. Drake 2007-02-26 23:06:19 Re: COMMIT NOWAIT Performance Option