Re: bitmap AM design

From: "Victor Y(dot) Yegorov" <viy(at)mits(dot)lv>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: bitmap AM design
Date: 2005-03-04 23:03:04
Message-ID: 20050304230304.GA6276@mits.lv
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Some more thoughts and questions.

The main idea above bitmaps is narrowing some data sets' possible values to
"yes" and "no" (i.e. 1 and 0) and then organize the data in the series of
bits, where bit's position determines values to consider. In the cases, where
several indexed attributes are used in WHERE clause, it's possible to do
logical AND/OR on bitmaps before returning any results to the caller. For
large tables with high number of low-cardinality attributes using bitmaps can
result in certain speed-up.

For on-disk bitmaps I'm working on, each value of each indexed attribute has
it's own bitmap (i.e. series of bits, with bits set to 1 for rows with
corresponding fields having value of that bitmap). Scanning the bitmap, we end
up with an array of "1-bits' positions" and need to convert those positions to
CTIDs, as executor is expecting. So, index should also keep a CTID table, that
corresponds to the bitmap's data. This CTID table will be the same for all
bitmap indexes, created for one table, thus having 2 bitmap indexes will mean
you're wasting some amount of disk space, storing absolutely identical data.
So, to save space, we have 2 possibilities:
1) create a new relkind for the CTID table (maybe used not only for on-disk
bitmaps);
2) make all "create index ... using bitmap" statements actually create/extend
existing bitmap index. This also implies, that planner/executor should
try using multicolumn bitmap index when at least one indexed field is
present in the WHERE clause.

I'm working on the 2nd case, because 1st one requires more work not only in
the access method + executor + planner area. It is also possible to keep
things as is and make a note in the documentation, that it is better to have 1
multicolumn bitmap index, then several single column ones, and that planner
will still use multicolumn index even if not all columns are involved.

Any comments/ideas here?

After implementing bitmap index access method, it'll be necessary to teach
planner and executor to use multicolumn bitmaps for any number of
scan-attributes. Also, I cannot say in what circumstances planner should
prefer bitmap scan to seqscan; I thought of cases, when it estimates return
set being about 60% of the relation. What community has to say here?

Also, as Tom is planning to work on in-memory bitmaps (maybe something is
done already, don't know), I thought that it can be possible to cooperate.
As I have no clue at the moment how in-memory bitmaps are going to work, is it
possible to hear from you some draft of the forthcoming implementation?
And what prerequisites would be required to join both bitmaps somehow?

Waiting for your thoughts/comments.

--

Victor Y. Yegorov

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2005-03-05 01:10:52 Re: 8.0.X and the ARC patent
Previous Message Tom Lane 2005-03-04 20:42:26 Re: buildfarm issues