Re: ts_headline

From: Richard Huxton <dev(at)archonet(dot)com>
To: Stephen Davies <scldad(at)sdc(dot)com(dot)au>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: ts_headline
Date: 2008-02-22 09:03:29
Message-ID: 47BE8FE1.7050306@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-patches

Stephen Davies wrote:
> OK. The first level explanation is that my default config is "simple".

Aha! Actually, that's the whole explanation.

> This explains the different query results as "english" reduces "database" to
> "databas" while "simple does not reduce it at all.

Exactly.

> The "document" is parsed/indexed using "english" explicitly so my queries nedd
> to be explicit also (not an issue as all "real" queries are generated rather
> than typed).

Or change your default configuration to match the one you're using.

> However, I still cannot see a reason for the ts_headline results. If anything,
> they should be the other way around.
>
> I suspect that ts_headline may only work properly when no configuration is
> specified - regardless of the default setting.

No. What's happening is that your tsvector representation of the
document (which gets indexed) contains lexemes processed by your
"english" config. So, it will have something like:
... databas: 123, 129, 200 ...
Of course, when you do a tsquery search with "simple" configuration it
checks doesn't do any stemming so is actually looking for a lexeme
called "database" which it can't find.

Since it can't find anything, it falls back to displaying just the start
of the document. Since the alternative would be to display nothing, that
makes a certain amount of sense.

To check this, try: ts_headline(t, to_tsquery('simple','databas')) and
you should get your database results.

Moral of the story: if you specify a configuration, always specify it.

Thanks for working through this Stephen - good question specification btw.

--
Richard Huxton
Archonet Ltd

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Stephen Davies 2008-02-22 09:10:39 Re: ts_headline
Previous Message Dean Gibson (DB Administrator) 2008-02-22 07:45:24 Re: client_encoding

Browse pgsql-patches by date

  From Date Subject
Next Message Stephen Davies 2008-02-22 09:10:39 Re: ts_headline
Previous Message Zdenek Kotala 2008-02-22 07:56:35 Re: fix in --help output