Re: Bitmap index status

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Jie Zhang <jzhang(at)greenplum(dot)com>
Cc: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bitmap index status
Date: 2006-09-26 09:17:10
Message-ID: 4518F016.8080709@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Looks a bit better now, though I think you need to think more about the
encapsulation of the structs. More detailed comments below.

Jie Zhang wrote:
> Essentially, we want to have a stream bitmap object that has an iterator,
> which will be able to iterate through the bitmaps. The BitmapIndexscan,
> BitmapAnd, BitmapOr will be executed once and return a streamp bitmap or a
> hash bitmap. The BitmapHeapscan then calls tbm_iterate() to iterate
> through
> the bitmaps.
>
> The StreamBitmap structure will look like below.
>
> struct StreamBitmap {
> NodeTag type; /* to make it a valid Node */
> PagetableEntry entry; /* a page of tids in this stream bitmap */
>
> /* the iterator function */
> void (*next)(StreamBitmap*);
> Node* state; /* store how this stream bitmap generated,
> and all necessary information to
> obtain the next stream bitmap. */
> };

I'd suggest making state just a (void *). It's private to the producer
of the bitmap, and I don't see a reason to expose it. I assume that the
next-function fills in the PageTableEntry with the next set of tids.

> Two new state objects will look like below. At the same time, we introduce
> three new node types: T_StreamBitmapAND, T_StreamBitmapOR,
> And T_StreamBitmapIndex, to define different states.
>
> /*
> * Stores the necessary information for iterating through the stream
> bitmaps
> * generated by nodeBitmapAnd or nodeBitmapOr.
> */
> struct StreamBitmapOp {
> NodeTag type; /* handles T_StreamBitmapAND and T_StreamBitmapOR */
> List* bitmaps;
> };

AFAICS, this struct is private to tidbitmap.c, where the new
stream-enabled tbm_intersect etc. functions are defined. Also, I don't
see a reason why it needs to by a valid Node.

> /*
> * Stores some necessary information for iterating through the stream
> * bitmaps generated by nodeBitmapIndexscan.
> */
> struct StreamBitmapIndex {
> NodeTag type; /* handle T_StreamBitmapIndex */
> IndexScanDesc scan;
> BlockNumber nextBlockNo;/* next block no to be read */
> };

Where would this struct be defined? I think different index access
methods might want to have different kind of states. The struct above
assumes that the position of an index scan is always represented by the
nextBlockNo. That seems to be the right thing for the bitmap indexam, so
this struct is fine for StreamBitmaps returned by bmgetbitmap, but not
necessary for others.

> Then we will have the iterator functions like the following:
>
> ...
>
> void StreamBitmapIndexNext(StreamBitmap* node) {
> StreamBitmapIndex* sbi = (StreamBitmapIndex*) node->state;
> amgetbitmap(sbi->scan, NULL, sbi->nextBlockNo);
> }

This means that the amgetbitmap function would still be called many
times in each scan. What would amgetbitmap return? A new StreamBitmap
on each call?

I'd like to see just one call to amgetbitmap. It would a) fill in the
hash bitmap passed to it, b) return a new hash bitmap, or c) return a
new StreamBitmap, with a indexam specific next-function that returns the
tids one page at a time. I think we'll also need some kind of a
destructor in StreamBitmap that's called by the consumer of the bitmap
after it's done with it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joachim Wieland 2006-09-26 09:26:38 Re: Buildfarm alarms
Previous Message Jeremy Drake 2006-09-26 08:58:47 Re: [HACKERS] large object regression tests