Re: full text indexing

From: "Mitch Vincent" <mitch(at)venux(dot)net>
To: "Poul L(dot) Christiansen" <poulc(at)cs(dot)auc(dot)dk>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: full text indexing
Date: 2000-09-27 20:33:39
Message-ID: 016301c028c2$39325420$0200000a@doot
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> Hi
>
> In my PostgreSQL database I have a lot of newspaper articles (size:
> 100mb now, growing beyond 1gb within few months).
> I wan't to use full text indexing so that users can search the articles
> with a keyword and have the results in less than one second.
>
> How do I accomplish that?

It's complicated :-)

> Does PostgreSQL have this feature?

Nope.

> Which 3rd party indexing tools are available that easily interoperate
> with PostgreSQL?

There is some code in contrib called FTI (Full Text Index - no doubt).. I
re-wrote it for my uses but ended up not using because I was doing so many
sorts and joins, it made a scan (even an index scan) of a table with 3
million rows in it very slow. However id you were just searching the keyword
table itself it was very, very fast.

The FTI trigger in the contrib breaks the words down to 2 letter bits (for
substring searching) -- mine doesn't, it only indexes whole words without
duplicates and looks at a list of words not to index (words like a, an, the,
anything else you want -- I think it has the 300 most used English words in
there already)

It's drawback is speed, it does take a few seconds to index on INSERT (and
UPDATE) -- that's if your text fields are 30ish k (very close to the max PG
can store)..

I would say you're going to run into the 32k limit pretty quick with
newspaper articles -- I index resumes and I've run into it many times (32k
of text really isn't all that much)..

Anyway, I'll try and get that trigger together that I did and send it to the
PG guys to see if it's worthy of being added to contrib -- I'll send you a
copy in private if you'd like. Note: I think there have been more people to
re-write that trigger, I haven't seen anything else though..

Good luck!

<End of long-winded response>

-Mitch

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Gunnar R|nning 2000-09-27 20:45:41 Re: full text indexing
Previous Message Michelle Murrain 2000-09-27 20:25:53 pg_dump problems