Skip site navigation (1) Skip section navigation (2)

Re: [GENERAL] Fragments in tsearch2 headline

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: sushant354(at)gmail(dot)com, Pierre-Yves Strub <pierre(dot)yves(dot)strub(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] Fragments in tsearch2 headline
Date: 2008-06-03 18:53:06
Message-ID: 48459312.9070505@sigaev.ru (view raw or flat)
Thread:
Lists: pgsql-generalpgsql-hackers
> Why we need norms?

We don't need norms at all - all matched HeadlineWordEntry already marked by 
HeadlineWordEntry->item! If it equals to NULL then this word isn't contained in 
tsquery.

> hlCover does the exact thing that Cover in tsrank does which is to find
> the  cover that contains the query. However hlcover has to go through
> words that do not match the query. Cover on the other hand operates on
> position indexes for just the query words and so it should be faster. 
Cover, by definition, is a minimal continuous text's piece matched by query. May 
be a several covers in text and hlCover will find all of them. Next, 
prsd_headline() (for now) tries to define the best one. "Best" means: cover 
contains a lot of words from query, not less that MinWords, not greater than 
MaxWords, hasn't words shorter that ShortWord on the begin and end of cover etc.
> 
> The main reason why I would I like it to be fast is that I want to
> generate all covers for a given query. Then choose covers with smallest
hlCover generates all covers.

> Let me know what you think on this patch and I will update the patch to
> respect other options like MinWords and ShortWord. 

As I understand, you very wish to call Cover() function instead of hlCover() - 
by design, they should be identical, but accepts different document's 
representation. So, the best way is generalize them: develop a new one which can 
be called with some kind of callback or/and opaque structure to use it in both 
rank and headline.

> 
> NumFragments < 2:
> I wanted people to use the new headline marker if they specify
> NumFragments >= 1. If they do not specify the NumFragments or put it to
Ok, but if you unify cover generation and NumFragments == 1 then result for old 
and new algorithms should be the same...


> On an another note I found that make_tsvector crashes if it receives a
> ParsedText with curwords = 0. Specifically uniqueWORD returns curwords
> as 1 even when it gets 0 words. I am not sure if this is the desired
> behavior.
In all places there is a check before call of make_tsvector.

-- 
Teodor Sigaev                                   E-mail: teodor(at)sigaev(dot)ru
                                                    WWW: http://www.sigaev.ru/

In response to

Responses

pgsql-hackers by date

Next:From: Zdenek KotalaDate: 2008-06-03 19:06:48
Subject: Re: Case-Insensitve Text Comparison
Previous:From: Kevin GrittnerDate: 2008-06-03 18:48:05
Subject: Re: PostgreSQL 8.3 XML parser seems not to recognize the DOCTYPE element in XML files

pgsql-general by date

Next:From: Сян ЦзяньнинDate: 2008-06-03 19:39:13
Subject: Re: Forcing Postgres to Execute a Specific Plan
Previous:From: Kevin GrittnerDate: 2008-06-03 18:48:05
Subject: Re: PostgreSQL 8.3 XML parser seems not to recognize the DOCTYPE element in XML files

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group