Re: integration of fulltext search in bytea/docs

From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: integration of fulltext search in bytea/docs
Date: 2009-07-29 15:23:46
Message-ID: 20090729152345.GF5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Jul 29, 2009 at 04:46:43PM +0200, Radek Novotnnn wrote:
> is there in the roadmap of postgre integration of fulltext searching in
> documents saved in blobs (bytea)?

Do you mean bytea or large-objects?

> Would be very very nice (postgre users can be proud to be first) to save
> documents into bytea and search that field via to_tsvector, to_tsquery ...

This seems easy; for large objects, just use lo_export() to dump the
blob out to the filesystem, and then use something like pl/perl to run
antiword on it, saving the results to another file and then returning
the file line-by-line as a SETOF TEXT (I think this is the best way of
handling things in case the resulting text file is enormous anyway). If
this code was called "runfilter" we can use it like:

UPDATE myfiles f SET tsidx = (
SELECT ts_accum(to_tsvector(t))
FROM runfilter(f.loid) t);

Where we've defined ts_accum to be:

CREATE AGGREGATE ts_accum (tsvector) (
SFUNC = tsvector_concat,
STYPE = tsvector,
INITCOND = ''
);

bytea is different because you know when the values has changed (i.e.
write a trigger) but you need to write more code to get the bytea value
out into the filesystem.

--
Sam http://samason.me.uk/

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2009-07-29 15:25:35 Re: OID in $_TD->{new}/$_TD->{old}
Previous Message Tom Lane 2009-07-29 15:23:01 Re: How to prevent duplicate key error when two processes do DELETE/INSERT simultaneously?