Re: [PATCHES] WIP: bitmap indexes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jie Zhang" <jzhang(at)greenplum(dot)com>
Cc: "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] WIP: bitmap indexes
Date: 2006-08-16 13:15:41
Message-ID: 13399.1155734141@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

"Jie Zhang" <jzhang(at)greenplum(dot)com> writes:
> On 8/15/06 6:18 AM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Well, as I said, I don't think there's justification for exposing a
>> bitmap index's internal data formats to the rest of the system like
>> that: it's not very future-proof and I don't see that it's buying any
>> significant performance gain.

> The bitmap words in the bitmap index are very simple and can be very
> generic.

They're not generic in the least: there's a compression scheme involved
that you might want to whack around at any time. So I disagree with the
idea that it's OK to expose the format outside the access/bitmap/ module.

> I like this idea. I think that we can define a new TBMStatus to be
> TBM_STREAM in TIDBitmap.

It occurs to me that what tbm_begin_iterate really is is a constructor
for a stream bitmap object that reads out the contents of a tbm bitmap
(we need a decent name for the non-stream data structure ... maybe
hash bitmap?). If we think of it like that then we can unify the
ideas some more.

My proposal at this point would be to invent two different Node types,
one for stream bitmaps and one for hash bitmaps. The initial input to
nodeBitmapHeapscan can be either, but if it's given a hash bitmap then
it stream-ifies it for use. amgetmulti can return either kind, and
nodeBitmapAnd and nodeBitmapOr can use IsA tests to decide what to do.
Preserving the existing optimization for ORing hash bitmaps is a bit
tricky but I think it's doable. Consider this API for amgetmulti:

amgetmulti takes an argument which can be either a hash bitmap or NULL.
It returns an object that must be either a hash or stream bitmap.
If it wants to return a stream bitmap, it simply disregards the argument
and returns a constructed stream-bitmap object. If it wants to return
a hash bitmap, then if the argument is not NULL, OR the additional bits
into the argument object and return it; if the argument is NULL,
construct a fresh hash-bitmap object, set bits in it, and return it.

Assume that we have the existing hash-bitmap AND/OR functions as well as
constructors for AND and OR stream bitmaps that take lists of input
stream objects. Then the algorithm for nodeBitmapOr looks like this:

HashBitmap *hashBitmap = NULL;
List *streamBitmaps = NIL;

foreach(input plan)
{
Node *newBitmap = amgetmulti(hashBitmap);

if (IsA(newBitmap, HashBitmap))
{
// any OR-ing required was done implicitly
hashBitmap = newBitmap;
}
else
{
Assert(IsA(newBitmap, StreamBitmap));
streamBitmaps = lappend(streamBitmaps, newBitmap);
}
}

if (streamBitmaps == NIL)
{
// all inputs returned hash, so we're done
return hashBitmap;
}
else
{
// need a stream OR operation atop the inputs
if (hashBitmap)
streamBitmaps = lappend(streamBitmaps,
HashToStreamBitmap(hashBitmap));
return ConstructStreamOr(streamBitmaps);
}

nodeBitmapAnd is a bit different but not any harder.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Volkan YAZICI 2006-08-16 13:20:46 Re: "cache reference leak" and "problem in alloc set" warnings
Previous Message Andrew Dunstan 2006-08-16 13:14:47 Re: BugTracker (Was: Re: 8.2 features status)

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2006-08-16 15:17:46 Re: CREATE INDEX ... ONLINE
Previous Message Tom Lane 2006-08-16 12:51:21 Re: [HACKERS] Forcing current WAL file to be archived