From: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
---|---|
To: | Pierre-Yves Strub <pierre(dot)yves(dot)strub(at)gmail(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Fragments in tsearch2 headline |
Date: | 2008-05-24 03:57:16 |
Message-ID: | 4837921C.8000905@sigaev.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
[moved to -hackers, because talk is about implementation details]
> I've ported the patch of Sushant Sinha for fragmented headlines to pg8.3.1
> (http://archives.postgresql.org/pgsql-general/2007-11/msg00508.php)
Thank you.
1 > diff -Nrub postgresql-8.3.1-orig/contrib/tsearch2/tsearch2.c
now contrib/tsearch2 is compatibility layer for old applications - they don't
know about new features. So, this part isn't needed.
2 solution to compile function (ts_headline_with_fragments) into core, but
using it only from contrib module looks very odd. So, new feature can be used
only with compatibility layer for old release :)
3 headline_with_fragments() is hardcoded to use default parser, but what will be
in case when configuration uses another parser? For example, for japanese language.
4 I would prefer the signature ts_headline( [regconfig,] text, tsquery [,text] )
and function should accept 'NumFragments=>N' for default parser. Another parsers
may use another options.
5 it just doesn't work correctly, because new code doesn't care of parser
specific type of lexemes.
contrib_regression=# select headline_with_fragments('english', 'wow asd-wow
wow', 'asd', '');
headline_with_fragments
----------------------------------
...wow asd-wow<b>asd</b>-wow wow
(1 row)
So, I incline to use existing framework/infrastructure although it may be a
subject to change.
Some description:
1 ts_headline defines a correct parser to use
2 it calls hlparsetext to split text into structure suitable for both goals:
find the best fragment(s) and concatenate that fragment(s) back to the text
representation
3 it calls parser specific method prsheadline which works with preparsed text
(parse was done in hlparsetext). Method should mark a needed
words/parts/lexemes etc.
4 ts_headline glues fragments into text and returns that.
We need a parser's headline method because only parser knows all about its lexemes.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/
From | Date | Subject | |
---|---|---|---|
Next Message | Teodor Sigaev | 2008-05-24 04:07:48 | Re: [GENERAL] Fragments in tsearch2 headline |
Previous Message | Alvaro Herrera | 2008-05-23 22:48:59 | Re: Short-circuiting FK check for a newly-added field |
From | Date | Subject | |
---|---|---|---|
Next Message | Teodor Sigaev | 2008-05-24 04:07:48 | Re: [GENERAL] Fragments in tsearch2 headline |
Previous Message | Dickson S. Guedes | 2008-05-24 03:27:16 | TODO item: Have psql show current values for a sequence |