Re: Bitmap index thoughts

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Gavin Sherry <swm(at)alcove(dot)com(dot)au>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bitmap index thoughts
Date: 2007-02-06 10:45:13
Message-ID: 45C85C39.5070401@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Gavin Sherry wrote:
> On Thu, 1 Feb 2007, Bruce Momjian wrote:
>
>> Where are we on this patch? Does it have performance tests to show
>> where it is beneificial? Is it ready to be reviewed?
>
> Here's an updated patch:
>
> http://www.alcove.com.au/~swm/bitmap-2007-02-02.patch
>
> In this patch, I rewrote the index build system. It was fast before for
> well clustered data but for poorly clustered data, it was very slow. Now,
> it is pretty good for each distribution type.
>
> I have various test cases but the one which showed bitmap a poor light was
> a table of 600M rows. The key to the table had a cardinality of 100,000.
> When the table was loaded with keys clustered, the build time was 1000
> seconds with bitmap (2200 with btree). With poorly clustered data (e.g.,
> the index key was (1, 2, 3, ..., 6000, 1, 2, 3, ...)), the build time for
> bitmap was 14000 seconds!
>
> So, I rewrote this to compress data using HRL encoding (the same scheme we
> use in the bitmap AM itself). Now, clustered data is just as fast and
> unclustered data is 2000 seconds.
>
> The select performance at a cardinality of 100,000 is similar to btree but
> faster with lower cardinalities.
>
> Jie also contributed a rewrite of the WAL code to this patch. Not only is
> the code faster now, but it handles the notion of incomplete actions --
> like btree and friends do. The executor code still needs some work from me
> -- Jie and I have dirtied things up while experimenting -- but we would
> really like some review of the code so that this can get squared away
> well before the approach of 8.3 feature freeze.
>
> One of the major deficiencies remaining is the lack of VACUUM support.
> Heikki put his hand up for this and I'm holding him to it! ;-)

Thanks :). I'll take a look at it.

I'm a bit worried that vacuuming can get complicated if an index is in
fact an index + a heap + a btree. To remove empty lov items and the
entries in the auxiliary heap and b-tree, you need to:

1. Memorize empty lov-items
2. Scan the heap, and mark the heap tuples corresponding the empty
lov-items as dead
3. Scan the b-tree, removing pointers to dead heap tuples
4. Remove dead heap tuples
5. Remove empty lov-items

Maybe it's possible to call the existing vacuuming code recursively, but
it feels quite horrible.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Kreen 2007-02-06 11:17:08 Re: Pl/pgsql functions causing crashes in 8.2.2
Previous Message Marko Kreen 2007-02-06 10:33:24 Re: Pl/pgsql functions causing crashes in 8.2.2