Skip site navigation (1) Skip section navigation (2)

Re: On-disk bitmap index implementation

From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>
Cc: <pgsql-patches(at)postgresql(dot)org>,"Jie Zhang" <jzhang(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index implementation
Date: 2006-12-04 16:22:52
Message-ID: 45744B5C.8030308@enterprisedb.com (view raw or flat)
Thread:
Lists: pgsql-patches
Gavin Sherry wrote:
> Hi all,
> 
> Attached is a patch implementing bitmap indexes. It includes major
> enhancements on the patch submitted during feature freeze for 8.2 here[1].
> 
> In particular: much better integration with the existing bitmap scan code
> with the internals of the bitmap streaming pushed down into the AM and
> hidden from the executor code; completely new index creation algorithm
> which reduced creation time by 20-75% depending on the data; modifications
> to the encoding mechanism to suit the integration with bitmap index scans;
> work on memory management; lots of code rewriting; range query support.
> The code is also much cleaner now.

Thanks! I'll take a look at it.

We need to give the indexam API some further thought. As you know, I've
been working on the Grouped Index Tuples stuff, which also requires
changes to the API to get full benefit. There's a bunch of functionality
I'd like to see:

* Support for streamed bitmaps, like you have implemented.

* Support for candidate matches. This is needed for GIT, as well as
range-encoded bitmap indexes if/when we add them.

* Support for returning tuples in partial order. This is again needed 
for GIT, because grouped tuples don't keep track of the ordering within 
the group, so they need to be sorted if ordering necessary. And again 
it's also useful to return tuples in order from range-encoded bitmaps.

* Support for kill_prior_tuple on bitmap scans.

* A bulk insert API. When inserting a lot of tuples with similar keys, 
we could a considerable amount of CPU with a bulk insert API. A bulk 
insert to a B-tree for example would only need to descend the tree once, 
find the insert location, lock the target page just once and insert all 
the tuples that belong to that page. That would potentially also reduce 
WAL traffic.

> There are still some things Jie and I have not gotten to yet:
> 
> o Improving VACUUM support -- currently, VACUUM FULL means REINDEX for
>   bitmaps. Heikki Linnakangas offered to work on this. Heikki, are you
>   still interested?

Yeah, I can look into that.

> o Test WAL replay more thoroughly.

I've had that problem too with a lot of things I've hacked. I've used a 
shell script that does the operation under test, runs a select, kills 
and restarts postmaster, and reruns the select. If the select after 
crash returns the same result as before, presumably WAL code works. But 
you need to watch out for full page writes that might mask bugs in the 
redo code.

Anyone have a more sophisticated method?

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com


In response to

Responses

pgsql-patches by date

Next:From: Heikki LinnakangasDate: 2006-12-04 16:33:44
Subject: Re: ECPG docs
Previous:From: Chris BrowneDate: 2006-12-04 15:33:47
Subject: ECPG docs

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group