Skip site navigation (1) Skip section navigation (2)

Re: Fragments in tsearch2 headline

From: "Sushant Sinha" <sushant354(at)gmail(dot)com>
To: "Catalin Marinas" <catalin(dot)marinas(at)gmail(dot)com>
Cc: "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>, "Richard Huxton" <dev(at)archonet(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org, "Teodor Sigaev" <teodor(at)sigaev(dot)ru>
Subject: Re: Fragments in tsearch2 headline
Date: 2007-11-12 03:46:50
Message-ID: 9fb559330711111946s34786d4ckf0fcef0b23626add@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-general
I wrote a headline generation function for my app and I have attached
the patch (against the cvs head). It generates multiple contexts in
which the query appears. Essentially, it uses the cover function to
generate all covers, chooses smallest covers and stretches each
selected cover according to the chosen parameters. I think ideally
changes should be made to prsd_headline function but I couldn't
understand that segment of code well.

The sql interface is

headline_with_fragments(text parser, tsvector docvector, text doc,
tsquery queryin, int4 maxcoverSize, int4 mincoverSize, int4 maxWords)
 RETURNS text

This will generate headline that contain maxWords and each cover
stretched to maxcoverSize. It will not add any fragment with less than
mincoverSize.
I am running my app with maxcoverSize = 20, mincoverSize = 5, maxWords = 40.
So it shows roughly two fragments per query.

If Teoder or Oleg want to add this to main branch, I will be happy to
clean it up and test it better.

-Sushant.




On Oct 31, 2007 6:26 PM, Catalin Marinas <catalin(dot)marinas(at)gmail(dot)com> wrote:
> On 30/10/2007, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
> > ok, then you have to formalize many things - how long should be excerpts,
> > how much excerpts to show, etc. In tsearch2 we have get_covers() function,
> > which produces all excerpts like:
> >
> > =# select get_covers(to_tsvector('1 2 3 4 5 3 4 abc x y z 2 3'), '2&3'::tsquery);
> >                     get_covers
> > ------------------------------------------------
> >   1 {1 2 3 }1 4 5 {2 3 4 abc x y z {3 2 }2 3 }3
> > (1 row)
>
> This function generates the lexemes, so cannot be used directly, but
> it is probably a good starting point.
>
> > Once you formalize your requirements, you can look on it and adapt to your
> > needs (and share with people). I think it could be nice contrib module.
>
> It seems that Sushant already wants to implement this function. He
> would probably be faster than me :-) (I'm relatively new to db stuff).
> Since I mainly rely on whatever a web hosting company provides, I'll
> probably stick with a Python implementation outside the SQL query.
>
> Thanks for your answers.
>
> --
> Catalin
>
> ---------------------------(end of broadcast)---------------------------
>
> TIP 5: don't forget to increase your free space map settings
>

Attachment: headline_with_fragments.patch
Description: text/x-patch (11.1 KB)

In response to

Responses

pgsql-general by date

Next:From: Charles.HouDate: 2007-11-12 06:33:06
Subject: float to int
Previous:From: Gauthier, DaveDate: 2007-11-12 03:23:02
Subject: pg_tables and temp tables

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group