Re: On-disk bitmap index implementation

From: "Jie Zhang" <jzhang(at)greenplum(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: On-disk bitmap index implementation
Date: 2006-12-04 18:13:46
Message-ID: C199A55A.C246%jzhang@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches


On 12/4/06 8:22 AM, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> wrote:

> Gavin Sherry wrote:
>> Hi all,
>>
>> Attached is a patch implementing bitmap indexes. It includes major
>> enhancements on the patch submitted during feature freeze for 8.2 here[1].
>>
>> In particular: much better integration with the existing bitmap scan code
>> with the internals of the bitmap streaming pushed down into the AM and
>> hidden from the executor code; completely new index creation algorithm
>> which reduced creation time by 20-75% depending on the data; modifications
>> to the encoding mechanism to suit the integration with bitmap index scans;
>> work on memory management; lots of code rewriting; range query support.
>> The code is also much cleaner now.
>
> Thanks! I'll take a look at it.

Thanks!

>
> We need to give the indexam API some further thought. As you know, I've
> been working on the Grouped Index Tuples stuff, which also requires
> changes to the API to get full benefit. There's a bunch of functionality
> I'd like to see:
>
> * Support for streamed bitmaps, like you have implemented.

Yes. It is also interesting to see how this type of stream bitmaps works the
one from the bitmap index.

>
> * Support for candidate matches. This is needed for GIT, as well as
> range-encoded bitmap indexes if/when we add them.
>

We have replace amgetmulti with amgetbitmap. This should be supported by
setting PagetableEntry accordingly.

> * Support for returning tuples in partial order. This is again needed
> for GIT, because grouped tuples don't keep track of the ordering within
> the group, so they need to be sorted if ordering necessary. And again
> it's also useful to return tuples in order from range-encoded bitmaps.

The bitmap index does not guarantee that returning tuples are ordered, but
it guarantees that the tuples from the same heap page will be returned
consecutively. I don't quite understand why this is useful for
range-encoding bitmaps in particular.

>
> * Support for kill_prior_tuple on bitmap scans.
>
> * A bulk insert API. When inserting a lot of tuples with similar keys,
> we could a considerable amount of CPU with a bulk insert API. A bulk
> insert to a B-tree for example would only need to descend the tree once,
> find the insert location, lock the target page just once and insert all
> the tuples that belong to that page. That would potentially also reduce
> WAL traffic.
>

Yes. Currently during the bitmap index creation, we maintain a buffer to
buffer many inserting tuples before writing them to bitmap pages, which
improves the creation performance by 30%-200% depending on the cardinalities
of the attribute to be indexed. This bulk insert API can take advantage of
this as well.

Thanks,
Jie

In response to

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2006-12-04 18:19:03 Re: Bundle of patches
Previous Message Chris Browne 2006-12-04 17:40:28 Re: ECPG docs