Re: Fragments in tsearch2 headline

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Pierre-Yves Strub <pierre(dot)yves(dot)strub(at)gmail(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fragments in tsearch2 headline
Date: 2008-05-24 03:57:16
Message-ID: 4837921C.8000905@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

[moved to -hackers, because talk is about implementation details]

> I've ported the patch of Sushant Sinha for fragmented headlines to pg8.3.1
> (http://archives.postgresql.org/pgsql-general/2007-11/msg00508.php)
Thank you.

1 > diff -Nrub postgresql-8.3.1-orig/contrib/tsearch2/tsearch2.c
now contrib/tsearch2 is compatibility layer for old applications - they don't
know about new features. So, this part isn't needed.

2 solution to compile function (ts_headline_with_fragments) into core, but
using it only from contrib module looks very odd. So, new feature can be used
only with compatibility layer for old release :)

3 headline_with_fragments() is hardcoded to use default parser, but what will be
in case when configuration uses another parser? For example, for japanese language.

4 I would prefer the signature ts_headline( [regconfig,] text, tsquery [,text] )
and function should accept 'NumFragments=>N' for default parser. Another parsers
may use another options.

5 it just doesn't work correctly, because new code doesn't care of parser
specific type of lexemes.
contrib_regression=# select headline_with_fragments('english', 'wow asd-wow
wow', 'asd', '');
headline_with_fragments
----------------------------------
...wow asd-wow<b>asd</b>-wow wow
(1 row)

So, I incline to use existing framework/infrastructure although it may be a
subject to change.

Some description:
1 ts_headline defines a correct parser to use
2 it calls hlparsetext to split text into structure suitable for both goals:
find the best fragment(s) and concatenate that fragment(s) back to the text
representation
3 it calls parser specific method prsheadline which works with preparsed text
(parse was done in hlparsetext). Method should mark a needed
words/parts/lexemes etc.
4 ts_headline glues fragments into text and returns that.

We need a parser's headline method because only parser knows all about its lexemes.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Teodor Sigaev 2008-05-24 04:07:48 Re: [GENERAL] Fragments in tsearch2 headline
Previous Message Alvaro Herrera 2008-05-23 22:48:59 Re: Short-circuiting FK check for a newly-added field

Browse pgsql-hackers by date

  From Date Subject
Next Message Teodor Sigaev 2008-05-24 04:07:48 Re: [GENERAL] Fragments in tsearch2 headline
Previous Message Dickson S. Guedes 2008-05-24 03:27:16 TODO item: Have psql show current values for a sequence