Re: Fragments in tsearch2 headline

From: "Sushant Sinha" <sushant354(at)gmail(dot)com>
To: "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>
Cc: "Catalin Marinas" <catalin(dot)marinas(at)gmail(dot)com>, "Richard Huxton" <dev(at)archonet(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org, "Teodor Sigaev" <teodor(at)sigaev(dot)ru>
Subject: Re: Fragments in tsearch2 headline
Date: 2007-10-30 17:11:58
Message-ID: 9fb559330710301011n77ef2544n4ef73dfce3177ac4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

This is a nice idea and seems easy to implement. I will try to write
it down and send a patch to the mailing list.

I was also working to add support for phrase search. Currently to
check for phrase you have to match the entire document. It will be
better if a filter like are_words_consecutive(tsvector *t, tsquery *q)
can be added to reduce the number of matching documents before we
actually do the phrase search. Do you think this will improve the
performance of phrase search? If so I will like to write this
function and send a patch.

-Sushant.

On 10/30/07, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
> On Tue, 30 Oct 2007, Catalin Marinas wrote:
>
> > On 30/10/2007, Richard Huxton <dev(at)archonet(dot)com> wrote:
> >> Oleg Bartunov wrote:
> >>> Catalin,
> >>>
> >>> what is your need ? What's wrong with this ?
> >>>
> >>> postgres=# select ts_headline('1 2 3 4 5 3 4 abc abc 2 3
> >>> xyz','2'::tsquery, 'StartSel=...,StopSel=...')
> >>> ;
> >>> ts_headline
> >>> -------------------------------------------
> >>> 1 ...2... 3 4 5 3 4 abc abc ...2... 3 xyz
> >>
> >> I think he want's something like: "1 2 3 ... abc 2 3 ..."
> >>
> >> A few characters of context around each match and then ... between. Kind
> >> of like grep -C.
> >
> > That's pretty much correct (with the difference that I'd like context
> > of words rather than lines as in "grep" and StartSel=<b>,
> > StopSel=</b>).
> >
> > Since the text I want a headline for might be pretty long (tens of
> > lines), I'd like to only show the excerpts around the matching words.
> > Similar to the above example:
> >
> > select ts_headline('1 2 3 4 5 3 4 abc x y z 2 3', '2 & abc'::tsquery);
> >
> > should give:
> >
> > '1 <b>2</b> 3 4 ... 3 4 <b>abc</b> x y'
> >
> > Currently, if you limit the maximum words so that 'abc' is too far, it
> > only highlights the first match.
>
> ok, then you have to formalize many things - how long should be excerpts,
> how much excerpts to show, etc. In tsearch2 we have get_covers() function,
> which produces all excerpts like:
>
> =# select get_covers(to_tsvector('1 2 3 4 5 3 4 abc x y z 2 3'),
> '2&3'::tsquery);
> get_covers
> ------------------------------------------------
> 1 {1 2 3 }1 4 5 {2 3 4 abc x y z {3 2 }2 3 }3
> (1 row)
>
> Once you formalize your requirements, you can look on it and adapt to your
> needs (and share with people). I think it could be nice contrib module.
>
>
> >
> > Many of the search engines (including google) show the headline this
> > way. I think Lucene can do this as well but I've never used it to be
> > sure.
> >
> >
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2007-10-30 17:15:41 Re: Table has type character varying, but query expects character varying
Previous Message M Rather 2007-10-30 17:01:15 Re: pgsql.broken.csc