Re: ts_headline

From: Richard Huxton <dev(at)archonet(dot)com>
To: Stephen Davies <scldad(at)sdc(dot)com(dot)au>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: ts_headline
Date: 2008-02-21 12:00:25
Message-ID: 47BD67D9.8090503@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-patches

Stephen Davies wrote:
> I just spotted the difference between your test and mine.
>
> My query says:
>
> select ts_headline(abstract,to_tsquery('english','database'),'minWords = 99,
> maxWords = 999') from document where id=21;
>
> where your equivalent does not include the 'english' arg.
>
> If I take out the 'english' from this query, I get the same result as you.

What does this give you:
show default_text_search_config;
I get pg_catalog.english and the same result for the query whether I use:
to_tsquery('english','database')
or to_tsquery('pg_catalog.english','database')

Could you be picking up a bad "english" configuration (see \dF)?

> However, the following returns zero rows:
>
> select title,author,ts_headline(abstract,to_tsquery('database') from document
> where clob @@ to_tsquery('database')

I take it "clob" matches "abstract"?

> It gets more interesting:
>
> select title,author,ts_headline(abstract,to_tsquery('database') from document
> where clob @@ to_tsquery('english','database')
>
> returns the "correct" result - one row with the expected headline.

Now that *is* strange. ts_headline() works without specifying 'english'
but the actual search works the other way.

> select title,author,ts_headline(abstract,to_tsquery('english','thesaurus')
> from document where clob @@ to_tsquery('english','thesaurus')
>
> also returns the "correct" result.
>
> I suggest that the above indicates a bug somewhere.

Could be - it'd be good to rule out a bad config. You might have an
unexpected list of stopwords or similar.

Let's try:
SELECT ts_debug('the database and thesaurus');
SELECT ts_debug('english', 'the database and thesaurus');
SELECT ts_debug('pg_catalog.english', 'the database and thesaurus');
I'd expect "the", "and" to be stripped out as stopwords and the other
two to get through (database stemmed to "databas").

--
Richard Huxton
Archonet Ltd

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Howard Wilkinson 2008-02-21 12:42:51 Re: Querying the schema for column widths - what syntax do I use?
Previous Message Stephen Davies 2008-02-21 11:34:55 Re: ts_headline

Browse pgsql-patches by date

  From Date Subject
Next Message Tatsuhito Kasahara 2008-02-21 13:45:55 Fix pgstatindex using for large indexes
Previous Message manolo.espa 2008-02-21 11:44:03 Re: 2WRS [WIP]