Re: Fragments in tsearch2 headline

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Sushant Sinha <sushant354(at)gmail(dot)com>
Cc: Catalin Marinas <catalin(dot)marinas(at)gmail(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Richard Huxton <dev(at)archonet(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Fragments in tsearch2 headline
Date: 2008-03-17 18:27:44
Message-ID: 200803171827.m2HIRiM08492@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Teodor, Oleg, do we want this?

http://archives.postgresql.org/pgsql-general/2007-11/msg00508.php

---------------------------------------------------------------------------

Sushant Sinha wrote:
> I wrote a headline generation function for my app and I have attached
> the patch (against the cvs head). It generates multiple contexts in
> which the query appears. Essentially, it uses the cover function to
> generate all covers, chooses smallest covers and stretches each
> selected cover according to the chosen parameters. I think ideally
> changes should be made to prsd_headline function but I couldn't
> understand that segment of code well.
>
> The sql interface is
>
> headline_with_fragments(text parser, tsvector docvector, text doc,
> tsquery queryin, int4 maxcoverSize, int4 mincoverSize, int4 maxWords)
> RETURNS text
>
> This will generate headline that contain maxWords and each cover
> stretched to maxcoverSize. It will not add any fragment with less than
> mincoverSize.
> I am running my app with maxcoverSize = 20, mincoverSize = 5, maxWords = 40.
> So it shows roughly two fragments per query.
>
> If Teoder or Oleg want to add this to main branch, I will be happy to
> clean it up and test it better.
>
> -Sushant.
>
>
>
>
> On Oct 31, 2007 6:26 PM, Catalin Marinas <catalin(dot)marinas(at)gmail(dot)com> wrote:
> > On 30/10/2007, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
> > > ok, then you have to formalize many things - how long should be excerpts,
> > > how much excerpts to show, etc. In tsearch2 we have get_covers() function,
> > > which produces all excerpts like:
> > >
> > > =# select get_covers(to_tsvector('1 2 3 4 5 3 4 abc x y z 2 3'), '2&3'::tsquery);
> > > get_covers
> > > ------------------------------------------------
> > > 1 {1 2 3 }1 4 5 {2 3 4 abc x y z {3 2 }2 3 }3
> > > (1 row)
> >
> > This function generates the lexemes, so cannot be used directly, but
> > it is probably a good starting point.
> >
> > > Once you formalize your requirements, you can look on it and adapt to your
> > > needs (and share with people). I think it could be nice contrib module.
> >
> > It seems that Sushant already wants to implement this function. He
> > would probably be faster than me :-) (I'm relatively new to db stuff).
> > Since I mainly rely on whatever a web hosting company provides, I'll
> > probably stick with a Python implementation outside the SQL query.
> >
> > Thanks for your answers.
> >
> > --
> > Catalin
> >
> > ---------------------------(end of broadcast)---------------------------
> >
> > TIP 5: don't forget to increase your free space map settings
> >

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Joey K. 2008-03-17 18:53:56 Re: identify database process given client process
Previous Message postgre 2008-03-17 17:57:41 Re: [GENERAL] large object import