From: | Heikki Linnakangas <heikki(at)enterprisedb(dot)com> |
---|---|
To: | Gavin Sherry <swm(at)alcove(dot)com(dot)au> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Bitmap index thoughts |
Date: | 2007-02-06 10:45:13 |
Message-ID: | 45C85C39.5070401@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Gavin Sherry wrote:
> On Thu, 1 Feb 2007, Bruce Momjian wrote:
>
>> Where are we on this patch? Does it have performance tests to show
>> where it is beneificial? Is it ready to be reviewed?
>
> Here's an updated patch:
>
> http://www.alcove.com.au/~swm/bitmap-2007-02-02.patch
>
> In this patch, I rewrote the index build system. It was fast before for
> well clustered data but for poorly clustered data, it was very slow. Now,
> it is pretty good for each distribution type.
>
> I have various test cases but the one which showed bitmap a poor light was
> a table of 600M rows. The key to the table had a cardinality of 100,000.
> When the table was loaded with keys clustered, the build time was 1000
> seconds with bitmap (2200 with btree). With poorly clustered data (e.g.,
> the index key was (1, 2, 3, ..., 6000, 1, 2, 3, ...)), the build time for
> bitmap was 14000 seconds!
>
> So, I rewrote this to compress data using HRL encoding (the same scheme we
> use in the bitmap AM itself). Now, clustered data is just as fast and
> unclustered data is 2000 seconds.
>
> The select performance at a cardinality of 100,000 is similar to btree but
> faster with lower cardinalities.
>
> Jie also contributed a rewrite of the WAL code to this patch. Not only is
> the code faster now, but it handles the notion of incomplete actions --
> like btree and friends do. The executor code still needs some work from me
> -- Jie and I have dirtied things up while experimenting -- but we would
> really like some review of the code so that this can get squared away
> well before the approach of 8.3 feature freeze.
>
> One of the major deficiencies remaining is the lack of VACUUM support.
> Heikki put his hand up for this and I'm holding him to it! ;-)
Thanks :). I'll take a look at it.
I'm a bit worried that vacuuming can get complicated if an index is in
fact an index + a heap + a btree. To remove empty lov items and the
entries in the auxiliary heap and b-tree, you need to:
1. Memorize empty lov-items
2. Scan the heap, and mark the heap tuples corresponding the empty
lov-items as dead
3. Scan the b-tree, removing pointers to dead heap tuples
4. Remove dead heap tuples
5. Remove empty lov-items
Maybe it's possible to call the existing vacuuming code recursively, but
it feels quite horrible.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Marko Kreen | 2007-02-06 11:17:08 | Re: Pl/pgsql functions causing crashes in 8.2.2 |
Previous Message | Marko Kreen | 2007-02-06 10:33:24 | Re: Pl/pgsql functions causing crashes in 8.2.2 |