Re: [GENERAL] ts_headline

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Stephen Davies <scldad(at)sdc(dot)com(dot)au>, Richard Huxton <dev(at)archonet(dot)com>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [GENERAL] ts_headline
Date: 2008-03-04 03:19:53
Message-ID: 200803040319.m243JsU23168@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-patches


I have applied the attached documentation patch to show ts_headline()
using a configuration name.

---------------------------------------------------------------------------

Oleg Bartunov wrote:
> On Sat, 23 Feb 2008, Stephen Davies wrote:
>
> > As it turns out, all I needed was in the doco but the key element - the first
> > config arg to ts_headline - was not in any of the examples so I missed it.
>
> aha, Original one were based on default
> configuration, but then concept was changed, but the examples were not
> modified.
>
> >
> > Would it be possible for ts_headline to work with the pre-parsed ts_vector?
>
> it's impossible, Richard already explained you the reasons.
>
> >
> > I see references to future plans for phrase searching in ts. Is there a date
> > for this?
>
> Not yet. The problem mostly algebraical :) Simple 'exact search' is doable, but
> we need something more, since we support boolean operators,
> pluggable dictionaries (which could produce several lexemes, for example),
> and document structure (lexem weights). So, we need to define consistent
> algebra for text, to have predictable results. This is quite a complex task,
> which require a lot of dedicated time, which we don't have.
>
> >
> > Cheers and thanks,
> > Stephen
> > Davies
> >
> >
> > On Friday 22 February 2008 22:54, Oleg Bartunov wrote:
> >> On Fri, 22 Feb 2008, Stephen Davies wrote:
> >>> Hmmmm!
> >>> I think I now understand the ts position better, thank you.
> >>>
> >>> Part of my problem has been that I am used to the functionality of Open
> >>> Text's LCS (aka BASIS) product which handles text differently.
> >>>
> >>> It includes the position (and context) information in the index and does
> >>> "remember" how the text was parsed so does not need to reparse to insert
> >>> hit navigation tags nor need pointers as to how to parse queries. (It
> >>> also supports phrase searching.)
> >>>
> >>> Now that I have a better understanding of ts, I think I will be able to
> >>> make it do at least most of what I hoped for.
> >>
> >> I'm wondering if it was not described in the text search documentation :)
> >>
> >>> Thank you again for your help with this.
> >>>
> >>> Cheers,
> >>> Stephen Davies
> >>>
> >>> On Friday 22 February 2008 20:45, Richard Huxton wrote:
> >>>> Stephen Davies wrote:
> >>>>> Unfortunately, my link to the box with the test database is down due to
> >>>>> lack of maintenance by our local telco (Telstra) but I think that I
> >>>>> also missed the optional config arg to ts_headline.
> >>>>>
> >>>>> The lack of link also means that I cannot confirm your findings but
> >>>>> your logic looks good.
> >>>>
> >>>> Looks like ALTER DATABASE SET default_text_config='english' is what you
> >>>> need.
> >>>>
> >>>>> It begs the question, however, as to why ts-headline needs to reparse
> >>>>> the raw text.
> >>>>
> >>>> It needs to line up tsvector lexemes with actual characters in the text.
> >>>> The tsvector is missing punctuation, any stopwords (the, it, a) as well
> >>>> as being stemmed (if your dictionary does that).
> >>>>
> >>>> Also, it's looking for a short span of words that provide the best
> >>>> match. That might not be a complete match of course, and is different to
> >>>> how you'd normally look to use a tsvector.
> >>>>
> >>>>> At least in my case, I am using a trigger to parse the combination of
> >>>>> Title and Abstract to a ts_vector field in the table row (as suggested
> >>>>> in 12.2.2 and 12.4.3 in the doco) so that the ts_vector is already
> >>>>> available to ts_headline.
> >>>>>
> >>>>> If ts_headline had the ability to use that pre-parsed ts_vector, my
> >>>>> problem would never have arisen - and the performance of ts_headline
> >>>>> would be improved.
> >>>>
> >>>> Maybe. It would still have to parse the text to some degree though, just
> >>>> to get the original words & punctuation into the headline.
> >>
> >> Regards,
> >> Oleg
> >> _____________________________________________________________
> >> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> >> Sternberg Astronomical Institute, Moscow University, Russia
> >> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> >> phone: +007(495)939-16-83, +007(495)939-23-83
> >
> >
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachment Content-Type Size
/rtmp/diff text/x-diff 1.5 KB

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2008-03-04 03:29:17 Re: text and bytea
Previous Message Brendan Jurd 2008-03-04 03:15:53 Re: [GENERAL] Empty arrays with ARRAY[]

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2008-03-04 03:30:26 Re: Fix pgstatindex using for large indexes
Previous Message Brendan Jurd 2008-03-04 03:15:53 Re: [GENERAL] Empty arrays with ARRAY[]