Re: On-disk bitmap index patch

From: mark(at)mark(dot)mielke(dot)cc
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Jie Zhang <jzhang(at)greenplum(dot)com>, Hannu Krosing <hannu(at)skype(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org, Luke Lonergan <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-25 01:58:05
Message-ID: 20060725015805.GA30087@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 24, 2006 at 09:04:28PM -0400, Bruce Momjian wrote:
> Jie Zhang wrote:
> > > IIRC they quoted the cardinality of 10000 as something that is still
> > > faster than btree for several usecases.
> > >
> > > And also for AND-s of several indexes, where indexes are BIG, your btree
> > > indexes may be almost as big as tables but the resulting set of pages is
> > > small.
> > Yeah, Hannu points it out very well -- the bitmap index works very well
> > when columns have low cardinalities and AND operations will produce small
> > number of results.
> What operations on columns of low cardinality produce a small number of
> results? That seems contradictory.

Not necessarily. Cardinality of 2 means 1/2. 3 means 1/3. 4 means 1/4. Etc.

Reading 1/4, for a larger table, has a good chance of being faster than
reading 4/4 of the table. :-)

No opinion on whether such tables exist in the real world - but the
concept itself seems sound.

> > Also, the bitmap index is very small in low cardinality cases, where the
> > btree tends to take up at least 10 times more space.
> Also, are adding/changing rows is more expensive with bitmaps than
> btrees?

Without looking at the code, but having read the Oracle docs, I would
guess yes. Every vacuum/delete would need to clear all of the bits for
the row. Every insert/update would need to allocate a bit in at least
the bitmap tree for the row inserted/updated. Seems like more pages
may need to be written. Although perhaps some clever optimization
would limit this. :-)

It seems interesting though. We won't really know until we see the
benchmarks. I'm seeing it as a form of working directly with the
intermediate form of the bitmap index scanner. If the existing index
scanner, building the bitmaps on the fly can out-perform the regular
index scanner, I'm seeing potentially in a pre-built bitmap.

Obviously, it isn't the answer to everything. The use I see for it,
are a few of my 1:N object attribute tables. The table has an object
identifier, and a column indicating the attribute type that the value
is for. If I have 20 or more attribute type values, however, most
objects include rows for most attribute types, my regular index ends
up repeating the attribute type for every row. If I want to scan the
table for all rows that have a particular attribute type with a
particular value, it's a seqscan right now. With the bitmap scanner,
knowing which rows to skip to immediately is readily available.

Will it be worth it or not? I won't know until I try it. :-)

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2006-07-25 01:59:41 Re: LDAP patch & feature freeze
Previous Message Gavin Sherry 2006-07-25 01:51:05 Re: On-disk bitmap index patch