Skip site navigation (1) Skip section navigation (2)

Re: [GENERAL] Fragments in tsearch2 headline

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: sushant354(at)gmail(dot)com
Cc: Pierre-Yves Strub <pierre(dot)yves(dot)strub(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] Fragments in tsearch2 headline
Date: 2008-06-02 14:10:21
Message-ID: 4843FF4D.8030109@sigaev.ru (view raw or flat)
Thread:
Lists: pgsql-generalpgsql-hackers
> I have attached a new patch with respect to the current cvs head. This
> produces headline in a document for a given query. Basically it
> identifies fragments of text that contain the query and displays them.
New variant is much better, but...

>  HeadlineParsedText contains an array of  actual words but not
> information about the norms. We need an indexed position vector for each
> norm so that we can quickly evaluate a number of possible fragments.
> Something that tsvector provides.

Why do you need to store norms? The single purpose of norms is identifying words 
from query - but it's already done by hlfinditem. It sets 
HeadlineWordEntry->item to corresponding QueryOperand in tsquery.
Look, headline function is rather expensive and your patch adds a lot of extra 
work  - at least in memory usage. And if user calls with NumFragments=0 the that 
work is unneeded.

> This approach does not change any other interface and fits nicely with
> the overall framework.
Yeah, it's a really big step forward. Thank you. You are very close to 
committing except: Did you find a hlCover() function which produce a cover from 
original HeadlineParsedText representation? Is any reason to do not use it?

> 
> The norms are converted into tsvector and a number of covers are
> generated. The best covers are then chosen to be in the headline. The
> covers are separated using a hardcoded coversep. Let me know if you want
> to expose this as an option.


> 
> Covers that overlap with already chosen covers are excluded.
> 
> Some options like ShortWord and MinWords are not taken care of right
> now. MaxWords are used as maxcoversize. Let me know if you would like to
> see other options for fragment generation as well.
ShortWord, MinWords and MaxWords should store their meaning, but for each 
fragment, not for the whole headline.


> 
> Let me know any more changes you would like to see.

         if (num_fragments == 0)
             /* call the default headline generator */
             mark_hl_words(prs, query, highlight, shortword, min_words, max_words);
         else
             mark_hl_fragments(prs, query, highlight, num_fragments, max_words);


Suppose, num_fragments < 2?

-- 
Teodor Sigaev                                   E-mail: teodor(at)sigaev(dot)ru
                                                    WWW: http://www.sigaev.ru/

In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2008-06-02 14:12:06
Subject: Re: Overhauling GUCS
Previous:From: Andrew SullivanDate: 2008-06-02 13:58:02
Subject: Re: Table rewrites vs. pending AFTER triggers

pgsql-general by date

Next:From: Maxim BogukDate: 2008-06-02 15:12:35
Subject: Cannot drop user (PostgreSQL 8.1.11)
Previous:From: A. KretschmerDate: 2008-06-02 14:06:45
Subject: Re: Question about cost-calculation

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group