Re: ts_headline

From: Richard Huxton <dev(at)archonet(dot)com>
To: Stephen Davies <scldad(at)sdc(dot)com(dot)au>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: ts_headline
Date: 2008-02-22 10:15:50
Message-ID: 47BEA0D6.1040105@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-patches

Stephen Davies wrote:
> Unfortunately, my link to the box with the test database is down due to lack
> of maintenance by our local telco (Telstra) but I think that I also missed
> the optional config arg to ts_headline.
>
> The lack of link also means that I cannot confirm your findings but your logic
> looks good.

Looks like ALTER DATABASE SET default_text_config='english' is what you
need.

> It begs the question, however, as to why ts-headline needs to reparse the raw
> text.

It needs to line up tsvector lexemes with actual characters in the text.
The tsvector is missing punctuation, any stopwords (the, it, a) as well
as being stemmed (if your dictionary does that).

Also, it's looking for a short span of words that provide the best
match. That might not be a complete match of course, and is different to
how you'd normally look to use a tsvector.

> At least in my case, I am using a trigger to parse the combination of Title
> and Abstract to a ts_vector field in the table row (as suggested in 12.2.2
> and 12.4.3 in the doco) so that the ts_vector is already available to
> ts_headline.
>
> If ts_headline had the ability to use that pre-parsed ts_vector, my problem
> would never have arisen - and the performance of ts_headline would be
> improved.

Maybe. It would still have to parse the text to some degree though, just
to get the original words & punctuation into the headline.

--
Richard Huxton
Archonet Ltd

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tino Wildenhain 2008-02-22 10:45:01 Re: selective backup and restore
Previous Message Tino Wildenhain 2008-02-22 10:09:02 Re: Querying the schema for column widths - what syntax do I use?

Browse pgsql-patches by date

  From Date Subject
Next Message Stephen Davies 2008-02-22 12:09:11 Re: ts_headline
Previous Message Zdenek Kotala 2008-02-22 10:07:52 Re: fix in --help output