Re: Database-based alternatives to tsearch2?

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Wes <wespvp(at)syntegra(dot)com>
Cc: pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Database-based alternatives to tsearch2?
Date: 2006-12-12 19:32:25
Message-ID: 1165951945.1651.38.camel@dogma.v10.wvs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, 2006-12-12 at 12:19 -0600, Wes wrote:
> I'm looking for a non index-based full text indexing - one that stores the
> information as table data instead of index data. I do not need to implement
> SQL operators for searches. The application library would need to implement
> the actual word search.
>

Store the tsvector (a custom type provided by tsearch2) as a separate
column in the table. This data type holds all the important information
about the indexed text, such as distinct words and some position
information, but it takes up much less space than a large document.

The tsearch2 package provides a lot of functionality even without the
index. But after you have a tsvector column, you can create an index on
it if you want.

> Indexes are too fragile. Our documents will be offline, and re-indexing
> would be impossible. Additionally, as I undertstand it, tsearch2 doesn't
> scale to the numbers I need (hundreds of millions of documents).
>

Try PostgreSQL 8.2 with tsearch2 using GIN. As I understand it, that's
very scalable.

Also, as I understand it, a GIN index should not need to be reindexed
unless there is a huge shift in the set of distinct words you're using.
However, if you do need to reindex, you can if you have the tsvector
column.

Regards,
Jeff Davis

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2006-12-12 20:18:59 Re: Asynchronous replication of a PostgreSQL DB to
Previous Message Josh Berkus 2006-12-12 18:49:27 Re: World Wide International Law: Linux is compulsory (mandotary) in all schools/universities world-wide