Skip site navigation (1) Skip section navigation (2)

Re: BUG #4562: ts_headline() adds space when parsing url

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Denis Monsieur" <dmonsieur(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: BUG #4562: ts_headline() adds space when parsing url
Date: 2008-12-09 01:20:28
Message-ID: 4357.1228785628@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-bugs
"Denis Monsieur" <dmonsieur(at)gmail(dot)com> writes:
> The problem is a space being added to text in the form of
> http://some.url/path
> Compare the output:

> shs=# SELECT ts_headline('http://some.url', to_tsquery('sometext'));
>    ts_headline
> -----------------
>  http://some.url
> (1 row)

> shs=# SELECT ts_headline('http://some.url/path', to_tsquery('sometext'));
>       ts_headline
> -----------------------
>  http:// some.url/path
> (1 row)

I looked into this, and it seems that the problem is that
generateHeadline() emits a space for any token marked as replace = 1.
I think it probably shouldn't emit anything at all.  AFAICS the cases
where replace will get set are token types URL, TAG, NUMHWORD,
ASCIIHWORD, HWORD.  For URL and the HWORD variants the space is
certainly undesirable, because these token types are just respecifying
text that is also covered by their component tokens.  The only case
where you could make an argument that the space is useful is TAG,
as in

regression=# SELECT ts_headline('http<foo>blah', to_tsquery('sometext'));
 ts_headline 
-------------
 http blah
(1 row)

But it seems to me to be at least as plausible that you should get
nothing as that you should get a space for a removed tag.

Comments?

			regards, tom lane

In response to

Responses

pgsql-bugs by date

Next:From: Tommy GildsethDate: 2008-12-09 07:04:52
Subject: BUG #4572: Incorrect error message when using wrong password with hostssl
Previous:From: Tom LaneDate: 2008-12-08 21:31:41
Subject: Re: BUG #4565: nextval not updated during wal replication, leading to pk violations

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group