Re: ts_headline

From: Stephen Davies <scldad(at)sdc(dot)com(dot)au>
To: Richard Huxton <dev(at)archonet(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: ts_headline
Date: 2008-02-22 09:55:55
Message-ID: 200802222025.55770.scldad@sdc.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-patches

Unfortunately, my link to the box with the test database is down due to lack
of maintenance by our local telco (Telstra) but I think that I also missed
the optional config arg to ts_headline.

The lack of link also means that I cannot confirm your findings but your logic
looks good.

It begs the question, however, as to why ts-headline needs to reparse the raw
text.

At least in my case, I am using a trigger to parse the combination of Title
and Abstract to a ts_vector field in the table row (as suggested in 12.2.2
and 12.4.3 in the doco) so that the ts_vector is already available to
ts_headline.

If ts_headline had the ability to use that pre-parsed ts_vector, my problem
would never have arisen - and the performance of ts_headline would be
improved.

Cheers and thanks,
Stephen

On Friday 22 February 2008 20:00, Richard Huxton wrote:
> Stephen Davies wrote:
> > Not quite:-(
> >
> > It is the ts_headline with the explicit "english" configuration that
> > "fails" rather than the implicit "simple".
>
> Hmm... arse.
>
> > That's what is so weird.
> >
> > As you say, the ts_vector has "databas" so the "english" version of
> > ts_headline should work - but it doesn't. The "simple" version does;
> > despite the above.
>
> [goes away, tests some more]
>
> OK, so:
>
> set default_text_search_config = 'simple';
> SELECT ts_headline('my database is a database', to_tsquery('database'));
> SELECT ts_headline('my database is a database', to_tsquery('simple',
> 'database'));
> SELECT ts_headline('my database is a database', to_tsquery('english',
> 'database'));
>
> The first two work, the last one doesn't.
>
> set default_text_search_config = 'english';
> SELECT ts_headline('my database is a database', to_tsquery('database'));
> SELECT ts_headline('my database is a database', to_tsquery('simple',
> 'database'));
> SELECT ts_headline('my database is a database', to_tsquery('english',
> 'database'));
>
> The middle one doesn't work.
>
> Note that there are no indexes involved here, we're just running against
> the raw text.
>
> [light goes on over sluggish London-based database chap]
>
> When the ts_headline function is working on the text, it needs to
> convert it from varchar/text type to tsvector so that it can use the
> tsquery to find words to highlight.
>
> When it converts the text to a tsvector, it's doing it based on
> default_text_search_config - we've not told it otherwise. In an ideal
> world, it would look "inside" the tsquery and see what config that was
> using, but it can't (or at least doesn't).
>
> Of course, if to_tsquery()'s config doesn't match to_tsheadline()'s then
> we get a problem.
>
> And, if I actually bother to read an up-to-date copy of the manual,
> rather than the beta version I've got linked on my desktop I can see
> there's a parameter for ts_headline. So...
>
> set default_text_search_config = 'simple';
> SELECT ts_headline('english', 'my database is a database',
> to_tsquery('english','database')
> );
>
> set default_text_search_config = 'english';
> SELECT ts_headline('simple', 'my database is a database',
> to_tsquery('simple','database')
> );
>
>
> These all work fine. Phew!

--
========================================================================
This email is for the person(s) identified above, and is confidential to
the sender and the person(s). No one else is authorised to use or
disseminate this email or its contents.

Stephen Davies Consulting Voice: 08-8177 1595
Adelaide, South Australia. Fax: 08-8177 0133
Computing & Network solutions. Mobile:0403 0405 83

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tino Wildenhain 2008-02-22 10:09:02 Re: Querying the schema for column widths - what syntax do I use?
Previous Message Richard Huxton 2008-02-22 09:30:18 Re: ts_headline

Browse pgsql-patches by date

  From Date Subject
Next Message Zdenek Kotala 2008-02-22 10:01:51 Re: fix in --help output
Previous Message Peter Eisentraut 2008-02-22 09:44:07 Re: fix in --help output