tsvector from external files

From: Perry Smith <pedzsan(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: tsvector from external files
Date: 2009-12-05 16:16:58
Message-ID: 97E0D2CA-1302-4EB7-8F0C-34ED54F37DC9@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

In the documentation there is this statement:

> Another possibility is to store the documents as simple text files
> in the file system. In this case, the database
> can be used to store the full text index and to execute searches,
> and some unique identifier can be used to
> retrieve the document from the file system.

It goes on to explain that there will be some limitations but I
believe this is the path I want to go.

Eventually I'm going to be using Ruby as my language to interface to
pglib and I'm not asking for Ruby help but some stepping stones would
help me. e.g. pointing out an interface in pglib would help me a
great deal.

For example, in to_tsvector([config regconfig , ] document text) --
how do I give it an external file for document text? I can think of
many possible approaches but I thought I would ask here first for
suggestions.

One approach is to go into a loop feeding perhaps 4K blocks of text
and using the || operator but that has two disadvantages. One is that
the tsvector, as it grows, is being pushed back and forth across the
client / server interface. Using this approach will not exactly give
me the same result (as explained with the || operator)...

The second approach is to create a large object first but that seems
inefficient too. Its also not clear that I can pass a reference to a
large object in place of document text either.

Thank you,
Perry Smith

Responses

Browse pgsql-general by date

  From Date Subject
Next Message morny 2009-12-05 17:37:19 FATAL: no pg_hba.conf entry for host “::1”
Previous Message Simon Riggs 2009-12-05 15:39:13 Re: PostgreSQL Release Support Policy