Re: bitmap AM design

From: pgsql(at)mohawksoft(dot)com
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql(at)mohawksoft(dot)com, "Victor Y(dot) Yegorov" <viy(at)mits(dot)lv>, "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: bitmap AM design
Date: 2005-03-01 18:42:09
Message-ID: 16535.24.91.171.78.1109702529.squirrel@mail.mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> pgsql(at)mohawksoft(dot)com writes:
>> Tom, I posted a message about a week ago (I forget the name) about a
>> persistent reference index, sort of like CTID, but basically a table
>> lookup. The idea is to simulate a structure that ISAM sort of techniques
>> can work in PostgreSQL.
>
>> Eliminating the bitmap index issue for a moment, how hard would it be to
>> create a reference table like index?
>
> I didn't see the point. You cannot actually use CTID that way (at least
> not without fundamentally redesigning our MVCC machinery) and anything
> else you might come up with is effectively just a random unique ID that
> has to be mapped through an index to find the row. I don't see the
> advantage compared to any ordinary application-defined primary key.

I have a search engine product which does use primary key based number.
The search engine also uses an external bitmap index. Here is the process:

Parse incoming text into discrete words.
Look up each word and retrieve its bitmap.
Combine all the bitmaps using the appropriate logical functions (AND, OR,
etc)
list out all the "1s" from the bitmaps as an entry into a table which
points to the primary key number.
Find all the records in the database with all the primary keys, sometimes
hundreds or thousands of entries in a "WHERE IN (...)" clause.
Now, after I've done all this logical work getting document numbers, I
have to do an index lookup for each one (or, god forbid, a full table
scan!)
This can be a long process, longer than actually doing the text search
with the bitmaps in the first place.

A "persistent reference" index would be like almost any other index except
that as new items are added to a table a new entry is added to the end of
the index. When a row is updated, its CTID is updated in the table. When
you run vacuum, you just update the CTID in the table as if it were any
other index. When you delete an item, you clear the CDID value of the
table entry. You can do "VACUUM COMPRESS INDEX myindex" which will
re-order and compact the persistent reference.

I know from a purely SQL standpoint, it sounds whacky, but from a VAR or
consultants point of view, it can really increase performance of some
products. That last "WHERE IN" clause of my search engine can take several
tens of seconds to run, but the actual search engine only took 0.03
seconds to find the documents. A persistent reference system would
eliminate tens ro thousands of index lookups per search query.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2005-03-01 18:51:05 Re: Where to see the patch queue (was Re: [PATCHES] Patch for Postmaster
Previous Message Tom Lane 2005-03-01 18:11:00 Re: bitmap AM design