Re: [GENERAL] Fragments in tsearch2 headline

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: sushant354(at)gmail(dot)com
Cc: Pierre-Yves Strub <pierre(dot)yves(dot)strub(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] Fragments in tsearch2 headline
Date: 2008-05-27 09:30:51
Message-ID: 483BD4CB.3030006@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hi!

> 1. Why is hlparsetext used to parse the document rather than the
> parsetext function? Since words to be included in the headline will be
> marked afterwords, it seems more reasonable to just use the parsetext
> function.
> The main difference I see is the use of hlfinditem and marking whether
> some word is repeated.
hlparsetext preserves any kind of lexeme - not indexed, spaces etc. parsetext
doesn't.
hlparsetext preserves original form of lexemes. parsetext doesn't.

>
> The reason this is important is that hlparsetext does not seem to be
> storing word positions which parsetext does. The word positions are
> important for generating headline with fragments.
Doesn't needed - hlparsetext preserves the whole text, so, position is a number
of array.

>
> 2.
>> I would prefer the signature ts_headline( [regconfig,] text, tsquery
>> [,text] )and function should accept 'NumFragments=>N' for default
>> parser. Another parsers may use another options.
>
> Does this mean we want a unified function ts_headline and we trigger the
> fragments if NumFragments is specified?

Trigger should be inside parser-specific function (pg_ts_parser.prsheadline).
Another parsers might not recognize that option.

> It seems that introducing a new
> function which can take configuration OID, or name is complex as there
> are so many functions handling these issues in wparser.c.
No, of course - ts_headline takes care about finding configuration and calling
correct parser.

>
> If this is true then we need to just add marking of headline words in
> prsd_headline. Otherwise we will need another prsd_headline_with_covers
> function.
Yeah, pg_ts_parser.prsheadline should mark the lexemes to. It even can change
an array of HeadlineParsedText.

>
> 3. In many cases people may already have TSVector for a given document
> (for search operation). Would it be faster to pass TSVector to headline
> function when compared to computing TSVector each time? If that is the
> case then should we have an option to pass TSVector to headline
> function?
As I mentioned above, tsvector doesn;t contain whole information about text.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Miklosi Attila 2008-05-27 09:35:49 active queries
Previous Message Richard Huxton 2008-05-27 09:06:15 Re: PostgreSQL full text vs. MySQL

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Meskes 2008-05-27 10:10:37 Re: WITH RECURSIVE patches V0.1 TODO items
Previous Message Michael Meskes 2008-05-27 08:51:02 Re: keyword list/ecpg