Re: OK, does anyone have any better ideas?

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: mlw <markw(at)mohawksoft(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Hackers List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: OK, does anyone have any better ideas?
Date: 2000-12-09 08:50:17
Message-ID: Pine.GSO.3.96.SK.1001209114358.4174j-100000@ra
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-novice

We need multi-key B-tree like index for such problem.
Our full text search engine is based on arrays and we need to find quickly
is some number exists in array - some kind of index over int array.
We're currently testing GiST approach and seems will have some conclusions
soon. I think multi-key B-tree like index would be better in my
opinion, but this requires to much work and knowledge of postgres's internals.
Yesterday I read about UBTree, seems like it's good for index and query
sets. Currently postgres has no set specific methods.

Regards,

Oleg
On Fri, 8 Dec 2000, mlw wrote:

> Date: Fri, 08 Dec 2000 20:17:34 -0500
> From: mlw <markw(at)mohawksoft(dot)com>
> To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Cc: Hackers List <pgsql-hackers(at)postgresql(dot)org>
> Subject: Re: [HACKERS] OK, does anyone have any better ideas?
>
> Tom Lane wrote:
> >
> > mlw <markw(at)mohawksoft(dot)com> writes:
> > > I have a working version of a text search engine. I want to make it work
> > > for Postgres (I will be releasing it GPL). It can literally find the
> > > occurrence of a string of words within 5 million records in a few
> > > milliseconds.
> >
> > Where are the records coming from? Are they inside the database?
> > (If not, why do you care about integrating this with Postgres?)
> >
> > It seems like the right way to integrate this sort of functionality
> > is to turn it into a kind of index, so that you can do
> >
> > SELECT * FROM mytable WHERE keyfield ~~~ 'search string';
> >
> > where ~~~ is the name of some operator that is associated with the
> > index. The temporary-table approach you are taking seems inherently
> > klugy, and would still be awkward even if we had functions returning
> > recordsets...
>
> Oh! Another method I tried and just could not get working was returning
> an array of integers. I as thinking about "select * from table where
> key_field in ( textsearch('bla bla') ), but I haven't been able to get
> that to work, and as per a previous post and belatedly reading a FAQ,
> this would probably still force a full table scan.
>
> Another method I thought about was having a table with some maximum
> number of zero initialized records, and trying something like:
>
> create table temp_table as select * from ts_template limit
> textsearch('bla bla', 10);
>
> select filltable(temp_table, 10);
>
> select * from table where key_field = temp_table.key;
>
> As you can see, all of these ideas are heinous hacks, there has to be a
> better way. Surely someone has a better idea.
>
> --
> http://www.mohawksoft.com
>

_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jarmo Paavilainen 2000-12-09 10:28:35 SV: Bug in index scans with Locale support enabled
Previous Message Philip Warner 2000-12-09 07:53:18 Re: 7.0.3(nofsync) vs 7.1

Browse pgsql-novice by date

  From Date Subject
Next Message Gérard TOURRES 2000-12-09 12:39:11 Query optimisation
Previous Message mlw 2000-12-09 02:40:14 Re: OK, does anyone have any better ideas?