Skip site navigation (1) Skip section navigation (2)

Re: Full text Indexing -out of contrib and into main..

From: Hannu Krosing <hannu(at)tm(dot)ee>
To: john huttley <john(at)mwk(dot)co(dot)nz>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Don Baccus <dhogaza(at)pacifier(dot)com>, Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Full text Indexing -out of contrib and into main..
Date: 2000-11-28 09:27:51
Message-ID: 3A237A97.D10C9E20@tm.ee (view raw or flat)
Thread:
Lists: pgsql-hackers
john huttley wrote:
> 
> > I believe that it is appropriate for contrib/ because it is a good demo
> > of FTI-like capabilities. But nothing more, yet. For at least a couple
> > of reasons:
> >
> > 1) It generates the "index" as a table, not a PostgreSQL index or
> > index-like thing.
> >
> > 2) It has a hardcoded list of non-indexed words. This should come from a
> > table, to allow it to be tuned to the application requirements.
> >
> > Comments?
> >
> >                      - Thomas
> >
> 
> In general..
> a) Considering that I was coding up the same thing with triggers and such,
> things could only get better.

AFAIK, the one in contrib _is_ the same thing coded up with triggers and
such ;)

> b) Check out MSSQL 7's  capabilities and weep.

BTW, have you studied MSSQL enough to tell me if it has a
separate/standalone 
(as a process) fti engine or just another index type.

I have been contemplating about implementing FTI for postgres for some
time and my 
current plan would be to implement a out-of-process fti engine (API +
sample 
implementation, in the spirit of PostgreSQLs extensibility) that could
postpone 
the actual indexing but still help with queries even for not yet fully
indexed stuff.

Will probably need some choreography but essential for high performance.

You generally don't want to wait for all index entries of an inverted
index to be saved.

Also the thing should be more general than the one in contrib , being
able to index 
both fields and full records and support functional indexes.


Is there a way to make PostgresQL optimiser aware of the
selectivity/cost of function, 
so that it can do the right thing for a query like

SELECT * FROM ARTICLES
  WHERE ADATE BETWEEN YESTERDAY AND TOMORROW
    AND ARTICLES.FTI_MATCHES('(CAT & DOG) ! PRESIDENT')

It would be almost automatic if functions could return sets and then be
used like

SELECT * FROM ARTICLE
  WHERE ADATE BETWEEN YESTERDAY AND TOMORROW
    AND ARTICLE_ID = ARTICLE.FTI_MATCHING_IDS('(CAT & DOG) ! PRESIDENT')

and somehow the optimiser would know that it can join on the returned
ids but this 
is probably not the case ;)

> c) It would be a start. One its in the tree, it gets used more, gets
> improved..

But, it is not a _real_ full text index, just a postgresql sample
application that 
implements a full text index using an sql database.

----------
Hannu

In response to

pgsql-hackers by date

Next:From: Zeugswetter Andreas SBDate: 2000-11-28 09:49:09
Subject: AW: 8192 BLCKSZ ?
Previous:From: He weipingDate: 2000-11-28 08:58:44
Subject: is it a bug?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group