Re: ts_headline

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Stephen Davies <scldad(at)sdc(dot)com(dot)au>
Cc: Richard Huxton <dev(at)archonet(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: ts_headline
Date: 2008-02-22 12:24:06
Message-ID: Pine.LNX.4.64.0802221523340.31180@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-patches

On Fri, 22 Feb 2008, Stephen Davies wrote:

> Hmmmm!
> I think I now understand the ts position better, thank you.
>
> Part of my problem has been that I am used to the functionality of Open Text's
> LCS (aka BASIS) product which handles text differently.
>
> It includes the position (and context) information in the index and does
> "remember" how the text was parsed so does not need to reparse to insert hit
> navigation tags nor need pointers as to how to parse queries. (It also
> supports phrase searching.)
>
> Now that I have a better understanding of ts, I think I will be able to make
> it do at least most of what I hoped for.

I'm wondering if it was not described in the text search documentation :)

>
> Thank you again for your help with this.
>
> Cheers,
> Stephen Davies
>
> On Friday 22 February 2008 20:45, Richard Huxton wrote:
>> Stephen Davies wrote:
>>> Unfortunately, my link to the box with the test database is down due to
>>> lack of maintenance by our local telco (Telstra) but I think that I also
>>> missed the optional config arg to ts_headline.
>>>
>>> The lack of link also means that I cannot confirm your findings but your
>>> logic looks good.
>>
>> Looks like ALTER DATABASE SET default_text_config='english' is what you
>> need.
>>
>>> It begs the question, however, as to why ts-headline needs to reparse the
>>> raw text.
>>
>> It needs to line up tsvector lexemes with actual characters in the text.
>> The tsvector is missing punctuation, any stopwords (the, it, a) as well
>> as being stemmed (if your dictionary does that).
>>
>> Also, it's looking for a short span of words that provide the best
>> match. That might not be a complete match of course, and is different to
>> how you'd normally look to use a tsvector.
>>
>>> At least in my case, I am using a trigger to parse the combination of
>>> Title and Abstract to a ts_vector field in the table row (as suggested in
>>> 12.2.2 and 12.4.3 in the doco) so that the ts_vector is already available
>>> to ts_headline.
>>>
>>> If ts_headline had the ability to use that pre-parsed ts_vector, my
>>> problem would never have arisen - and the performance of ts_headline
>>> would be improved.
>>
>> Maybe. It would still have to parse the text to some degree though, just
>> to get the original words & punctuation into the headline.
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message luca.ciciriello 2008-02-22 13:26:32
Previous Message Jorge Godoy 2008-02-22 12:21:32 Re: need some help on figuring out how to write a query

Browse pgsql-patches by date

  From Date Subject
Next Message Alvaro Herrera 2008-02-22 12:46:33 Re: fix in --help output
Previous Message Stephen Davies 2008-02-22 12:09:11 Re: ts_headline