From: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
---|---|
To: | sushant354(at)gmail(dot)com |
Cc: | Pierre-Yves Strub <pierre(dot)yves(dot)strub(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [GENERAL] Fragments in tsearch2 headline |
Date: | 2008-05-27 09:30:51 |
Message-ID: | 483BD4CB.3030006@sigaev.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
Hi!
> 1. Why is hlparsetext used to parse the document rather than the
> parsetext function? Since words to be included in the headline will be
> marked afterwords, it seems more reasonable to just use the parsetext
> function.
> The main difference I see is the use of hlfinditem and marking whether
> some word is repeated.
hlparsetext preserves any kind of lexeme - not indexed, spaces etc. parsetext
doesn't.
hlparsetext preserves original form of lexemes. parsetext doesn't.
>
> The reason this is important is that hlparsetext does not seem to be
> storing word positions which parsetext does. The word positions are
> important for generating headline with fragments.
Doesn't needed - hlparsetext preserves the whole text, so, position is a number
of array.
>
> 2.
>> I would prefer the signature ts_headline( [regconfig,] text, tsquery
>> [,text] )and function should accept 'NumFragments=>N' for default
>> parser. Another parsers may use another options.
>
> Does this mean we want a unified function ts_headline and we trigger the
> fragments if NumFragments is specified?
Trigger should be inside parser-specific function (pg_ts_parser.prsheadline).
Another parsers might not recognize that option.
> It seems that introducing a new
> function which can take configuration OID, or name is complex as there
> are so many functions handling these issues in wparser.c.
No, of course - ts_headline takes care about finding configuration and calling
correct parser.
>
> If this is true then we need to just add marking of headline words in
> prsd_headline. Otherwise we will need another prsd_headline_with_covers
> function.
Yeah, pg_ts_parser.prsheadline should mark the lexemes to. It even can change
an array of HeadlineParsedText.
>
> 3. In many cases people may already have TSVector for a given document
> (for search operation). Would it be faster to pass TSVector to headline
> function when compared to computing TSVector each time? If that is the
> case then should we have an option to pass TSVector to headline
> function?
As I mentioned above, tsvector doesn;t contain whole information about text.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/
From | Date | Subject | |
---|---|---|---|
Next Message | Miklosi Attila | 2008-05-27 09:35:49 | active queries |
Previous Message | Richard Huxton | 2008-05-27 09:06:15 | Re: PostgreSQL full text vs. MySQL |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Meskes | 2008-05-27 10:10:37 | Re: WITH RECURSIVE patches V0.1 TODO items |
Previous Message | Michael Meskes | 2008-05-27 08:51:02 | Re: keyword list/ecpg |