From: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
---|---|
To: | Stephen Davies <scldad(at)sdc(dot)com(dot)au> |
Cc: | Richard Huxton <dev(at)archonet(dot)com>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: ts_headline |
Date: | 2008-02-22 12:24:06 |
Message-ID: | Pine.LNX.4.64.0802221523340.31180@sn.sai.msu.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-patches |
On Fri, 22 Feb 2008, Stephen Davies wrote:
> Hmmmm!
> I think I now understand the ts position better, thank you.
>
> Part of my problem has been that I am used to the functionality of Open Text's
> LCS (aka BASIS) product which handles text differently.
>
> It includes the position (and context) information in the index and does
> "remember" how the text was parsed so does not need to reparse to insert hit
> navigation tags nor need pointers as to how to parse queries. (It also
> supports phrase searching.)
>
> Now that I have a better understanding of ts, I think I will be able to make
> it do at least most of what I hoped for.
I'm wondering if it was not described in the text search documentation :)
>
> Thank you again for your help with this.
>
> Cheers,
> Stephen Davies
>
> On Friday 22 February 2008 20:45, Richard Huxton wrote:
>> Stephen Davies wrote:
>>> Unfortunately, my link to the box with the test database is down due to
>>> lack of maintenance by our local telco (Telstra) but I think that I also
>>> missed the optional config arg to ts_headline.
>>>
>>> The lack of link also means that I cannot confirm your findings but your
>>> logic looks good.
>>
>> Looks like ALTER DATABASE SET default_text_config='english' is what you
>> need.
>>
>>> It begs the question, however, as to why ts-headline needs to reparse the
>>> raw text.
>>
>> It needs to line up tsvector lexemes with actual characters in the text.
>> The tsvector is missing punctuation, any stopwords (the, it, a) as well
>> as being stemmed (if your dictionary does that).
>>
>> Also, it's looking for a short span of words that provide the best
>> match. That might not be a complete match of course, and is different to
>> how you'd normally look to use a tsvector.
>>
>>> At least in my case, I am using a trigger to parse the combination of
>>> Title and Abstract to a ts_vector field in the table row (as suggested in
>>> 12.2.2 and 12.4.3 in the doco) so that the ts_vector is already available
>>> to ts_headline.
>>>
>>> If ts_headline had the ability to use that pre-parsed ts_vector, my
>>> problem would never have arisen - and the performance of ts_headline
>>> would be improved.
>>
>> Maybe. It would still have to parse the text to some degree though, just
>> to get the original words & punctuation into the headline.
>
>
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
From | Date | Subject | |
---|---|---|---|
Next Message | luca.ciciriello | 2008-02-22 13:26:32 | |
Previous Message | Jorge Godoy | 2008-02-22 12:21:32 | Re: need some help on figuring out how to write a query |
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2008-02-22 12:46:33 | Re: fix in --help output |
Previous Message | Stephen Davies | 2008-02-22 12:09:11 | Re: ts_headline |