Re: On-disk bitmap index patch

From: Hannu Krosing <hannu(at)skype(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dann Corbit <DCorbit(at)connx(dot)com>, Luke Lonergan <llonergan(at)greenplum(dot)com>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Jie Zhang <jzhang(at)greenplum(dot)com>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, Josh Berkus <josh(at)agliodbs(dot)com>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: On-disk bitmap index patch
Date: 2006-07-28 22:47:30
Message-ID: 1154126850.2961.35.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ühel kenal päeval, R, 2006-07-28 kell 16:18, kirjutas Tom Lane:
> "Dann Corbit" <DCorbit(at)connx(dot)com> writes:
> > Others have looked into the usefulness of bitmap indexes. Here is what
> > they found:
> > http://www.oracle.com/technology/pub/articles/sharma_indexes.html
>
> I like this guy's style of argument: he admits a bitmap index on a
> unique column will be much bigger than a btree, and then airily
> dismisses it as not a problem. Not real convincing.

This problem can be easyly avoided by not creating bitmap indexes on
unique columns. So I think it is ok to dismiss it.

> > http://citeseer.ist.psu.edu/stockinger02bitmap.html
>
> Both of these pages say up front that they are considering read-only
> data. So one of the questions that has to be answered (and the
> submitters have been entirely mum about) is exactly how bad is the
> update performance? If it's really awful that's going to constrain
> the use cases quite a lot, whereas merely "a bit slower than btree"
> wouldn't be such a problem.

May be.

OTOH, in OLAP databases you may be better off dropping the indexes
before data loading and rebuilding them after. And it has been shown
that bitmap indexes build a lot faster than btrees.

> In any case, arguing that other DBs find it's a win will cut no ice
> with me.

How about a more general argument. I claim that an index that is small
and fits in RAM is faster than a big one that does not fit in RAM.

> See adjacent discussion about hash indexes --- those *ought*
> to be a win, but they aren't in Postgres, for reasons that are still
> guesses. The translation gap between other DBs' experience and ours
> can be large.

IIRC the tests showing bitmap indexes being much faster on TPC-H were
done on postgresql, no ?

You pointed out that btree indexes are more bloated in this case as they
store padding spaces for all CHAR(N) fields whereas bitmap index stores
padding spaces only once for each distinct value.

Are there any plans to start optimising btree storage model in
forseeable future ?

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2006-07-28 22:54:08 Re: On-disk bitmap index patch
Previous Message Jim Nasby 2006-07-28 22:43:03 Re: The vacuum-ignore-vacuum patch