Re: Replacement for Oracle Text

From: Chris Travers <chris(dot)travers(at)gmail(dot)com>
To: Stephen Davies <sdavies(at)sdc(dot)com(dot)au>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, s d <daku(dot)sandor(at)gmail(dot)com>, Postgresql General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Replacement for Oracle Text
Date: 2016-02-20 05:51:49
Message-ID: CAKt_ZftXtFD-8BddWc2QwgDc5dC+Xc2_FwO7vpu0pKYRXA8jLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

A more general way would be to have a function which takes a pdf in and
returns the text. Mark it immutable.

Then you can index the output of converting that text to a tsvector.

You may want to pull everything into a tsvector column for ease of review,
but functional indexes also make that less important

On Sat, Feb 20, 2016 at 1:10 AM, Stephen Davies <sdavies(at)sdc(dot)com(dot)au> wrote:

> On 20/02/16 00:24, Bruce Momjian wrote:
>
>> On Fri, Feb 19, 2016 at 02:49:16PM +0100, s d wrote:
>>
>>> On 19 February 2016 at 14:19, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>>> > Ah, no. That's not possible
>>> >
>>> >
>>> > ...not possible, Yet.
>>> >
>>> > PostgreSQL grows by adding the features people need and its
>>> changing
>>> rapidly.
>>>
>>> I wonder if PLPerl could be used to extract the words from a PDF
>>> document and create a tsvector column from it.
>>>
>>> I don't know about PLPerl(I'm pretty sure it could be used for this
>>> purpose,
>>> though.). On the other hand I've written code for this in Python which
>>> should
>>> be easy to adapt for PLPython, if necessary.
>>>
>>
>> Right, so you would write a PL/Perl or PL/Python trigger function that
>> would populate the tsvector column on every INSERT or UPDATE.
>>
>> FWIW, I just use pdftotext in my CGI.
>
> --
>
> =============================================================================
> Stephen Davies Consulting P/L Phone: 08-8177
> 1595
> Adelaide, South Australia. Mobile:040 304
> 0583
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

--
Best Wishes,
Chris Travers

Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor
lock-in.
http://www.efficito.com/learn_more

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message John R Pierce 2016-02-20 06:12:17 Re: JDBC behaviour
Previous Message Sridhar N Bamandlapally 2016-02-20 04:40:46 Re: JDBC behaviour