|From:||"Jie Zhang" <jzhang(at)greenplum(dot)com>|
|To:||"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>|
|Subject:||Re: On-disk bitmap index implementation|
|Views:||Raw Message | Whole Thread | Download mbox|
On 12/4/06 8:22 AM, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> wrote:
> Gavin Sherry wrote:
>> Hi all,
>> Attached is a patch implementing bitmap indexes. It includes major
>> enhancements on the patch submitted during feature freeze for 8.2 here.
>> In particular: much better integration with the existing bitmap scan code
>> with the internals of the bitmap streaming pushed down into the AM and
>> hidden from the executor code; completely new index creation algorithm
>> which reduced creation time by 20-75% depending on the data; modifications
>> to the encoding mechanism to suit the integration with bitmap index scans;
>> work on memory management; lots of code rewriting; range query support.
>> The code is also much cleaner now.
> Thanks! I'll take a look at it.
> We need to give the indexam API some further thought. As you know, I've
> been working on the Grouped Index Tuples stuff, which also requires
> changes to the API to get full benefit. There's a bunch of functionality
> I'd like to see:
> * Support for streamed bitmaps, like you have implemented.
Yes. It is also interesting to see how this type of stream bitmaps works the
one from the bitmap index.
> * Support for candidate matches. This is needed for GIT, as well as
> range-encoded bitmap indexes if/when we add them.
We have replace amgetmulti with amgetbitmap. This should be supported by
setting PagetableEntry accordingly.
> * Support for returning tuples in partial order. This is again needed
> for GIT, because grouped tuples don't keep track of the ordering within
> the group, so they need to be sorted if ordering necessary. And again
> it's also useful to return tuples in order from range-encoded bitmaps.
The bitmap index does not guarantee that returning tuples are ordered, but
it guarantees that the tuples from the same heap page will be returned
consecutively. I don't quite understand why this is useful for
range-encoding bitmaps in particular.
> * Support for kill_prior_tuple on bitmap scans.
> * A bulk insert API. When inserting a lot of tuples with similar keys,
> we could a considerable amount of CPU with a bulk insert API. A bulk
> insert to a B-tree for example would only need to descend the tree once,
> find the insert location, lock the target page just once and insert all
> the tuples that belong to that page. That would potentially also reduce
> WAL traffic.
Yes. Currently during the bitmap index creation, we maintain a buffer to
buffer many inserting tuples before writing them to bitmap pages, which
improves the creation performance by 30%-200% depending on the cardinalities
of the attribute to be indexed. This bulk insert API can take advantage of
this as well.
|Next Message||Tom Lane||2006-12-04 18:19:03||Re: Bundle of patches|
|Previous Message||Chris Browne||2006-12-04 17:40:28||Re: ECPG docs|