Re: Full text Indexing -out of contrib and into main..

From: Hannu Krosing <hannu(at)tm(dot)ee>
To: john huttley <john(at)mwk(dot)co(dot)nz>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Don Baccus <dhogaza(at)pacifier(dot)com>, Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Full text Indexing -out of contrib and into main..
Date: 2000-11-28 09:27:51
Message-ID: 3A237A97.D10C9E20@tm.ee
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

john huttley wrote:
>
> > I believe that it is appropriate for contrib/ because it is a good demo
> > of FTI-like capabilities. But nothing more, yet. For at least a couple
> > of reasons:
> >
> > 1) It generates the "index" as a table, not a PostgreSQL index or
> > index-like thing.
> >
> > 2) It has a hardcoded list of non-indexed words. This should come from a
> > table, to allow it to be tuned to the application requirements.
> >
> > Comments?
> >
> > - Thomas
> >
>
> In general..
> a) Considering that I was coding up the same thing with triggers and such,
> things could only get better.

AFAIK, the one in contrib _is_ the same thing coded up with triggers and
such ;)

> b) Check out MSSQL 7's capabilities and weep.

BTW, have you studied MSSQL enough to tell me if it has a
separate/standalone
(as a process) fti engine or just another index type.

I have been contemplating about implementing FTI for postgres for some
time and my
current plan would be to implement a out-of-process fti engine (API +
sample
implementation, in the spirit of PostgreSQLs extensibility) that could
postpone
the actual indexing but still help with queries even for not yet fully
indexed stuff.

Will probably need some choreography but essential for high performance.

You generally don't want to wait for all index entries of an inverted
index to be saved.

Also the thing should be more general than the one in contrib , being
able to index
both fields and full records and support functional indexes.

Is there a way to make PostgresQL optimiser aware of the
selectivity/cost of function,
so that it can do the right thing for a query like

SELECT * FROM ARTICLES
WHERE ADATE BETWEEN YESTERDAY AND TOMORROW
AND ARTICLES.FTI_MATCHES('(CAT & DOG) ! PRESIDENT')

It would be almost automatic if functions could return sets and then be
used like

SELECT * FROM ARTICLE
WHERE ADATE BETWEEN YESTERDAY AND TOMORROW
AND ARTICLE_ID = ARTICLE.FTI_MATCHING_IDS('(CAT & DOG) ! PRESIDENT')

and somehow the optimiser would know that it can join on the returned
ids but this
is probably not the case ;)

> c) It would be a start. One its in the tree, it gets used more, gets
> improved..

But, it is not a _real_ full text index, just a postgresql sample
application that
implements a full text index using an sql database.

----------
Hannu

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zeugswetter Andreas SB 2000-11-28 09:49:09 AW: 8192 BLCKSZ ?
Previous Message He weiping 2000-11-28 08:58:44 is it a bug?