Re: OK, does anyone have any better ideas?

From: mlw <markw(at)mohawksoft(dot)com>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Hackers List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: OK, does anyone have any better ideas?
Date: 2000-12-09 13:18:10
Message-ID: 3A323112.FCF35B3B@mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-novice

Oleg Bartunov wrote:
>
> We need multi-key B-tree like index for such problem.
> Our full text search engine is based on arrays and we need to find quickly
> is some number exists in array - some kind of index over int array.
> We're currently testing GiST approach and seems will have some conclusions
> soon. I think multi-key B-tree like index would be better in my
> opinion, but this requires to much work and knowledge of postgres's internals.
> Yesterday I read about UBTree, seems like it's good for index and query
> sets. Currently postgres has no set specific methods.

The way I do my search indexing is with bitmap objects and a word
dictionary. One creates a searchable dictionary of words by scanning the
selected records. So, in one query that results in 2 million records,
the total aggregate number of words is about 60,000 depending on how you
parse. For each word, you create a "bitmap object" (in one of a few
forms) where bit '0' represents the first record, bit '1' represents the
second, and so on, until you have 2 million bits.

Set the correct bit in the bitmap for each document that contains that
word. In the end you will have the equivalent 60,000 bitmaps or 2
million bits.

During search time, one creates an empty bitmap of 2 million bits as a
work space. One parses the search term, and performs boolean operation
on the workspace from the bitmap retrieved for each word.

When you are done parsing, you have a bitmap with a bit set for each
document that fits the search criteria. You then enumerate the bits by
bit position, and you now have a list of document numbers.

If only I could get the list of document numbers back into
postgres....... It would be great.
--
http://www.mohawksoft.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hiroshi Inoue 2000-12-09 13:23:09 RE: [HACKERS] Japan pictures
Previous Message Oleg Bartunov 2000-12-09 12:40:53

Browse pgsql-novice by date

  From Date Subject
Next Message Oleg Bartunov 2000-12-09 13:41:37 Re: OK, does anyone have any better ideas?
Previous Message Oleg Bartunov 2000-12-09 12:40:53