Skip site navigation (1) Skip section navigation (2)

Re: BUG #4562: ts_headline() adds space when parsing url

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Denis Monsieur <dmonsieur(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: BUG #4562: ts_headline() adds space when parsing url
Date: 2009-01-15 06:19:49
Message-ID: Pine.LNX.4.64.0901150919130.9554@sn.sai.msu.ru (view raw or flat)
Thread:
Lists: pgsql-bugs
On Wed, 14 Jan 2009, Bruce Momjian wrote:

>
> This bug still exists in my testing.

We fixed all issues with ts_headline and will submit soon.

>
> ---------------------------------------------------------------------------
>
> Tom Lane wrote:
>> "Denis Monsieur" <dmonsieur(at)gmail(dot)com> writes:
>>> The problem is a space being added to text in the form of
>>> http://some.url/path
>>> Compare the output:
>>
>>> shs=# SELECT ts_headline('http://some.url', to_tsquery('sometext'));
>>>    ts_headline
>>> -----------------
>>>  http://some.url
>>> (1 row)
>>
>>> shs=# SELECT ts_headline('http://some.url/path', to_tsquery('sometext'));
>>>       ts_headline
>>> -----------------------
>>>  http:// some.url/path
>>> (1 row)
>>
>> I looked into this, and it seems that the problem is that
>> generateHeadline() emits a space for any token marked as replace = 1.
>> I think it probably shouldn't emit anything at all.  AFAICS the cases
>> where replace will get set are token types URL, TAG, NUMHWORD,
>> ASCIIHWORD, HWORD.  For URL and the HWORD variants the space is
>> certainly undesirable, because these token types are just respecifying
>> text that is also covered by their component tokens.  The only case
>> where you could make an argument that the space is useful is TAG,
>> as in
>>
>> regression=# SELECT ts_headline('http<foo>blah', to_tsquery('sometext'));
>>  ts_headline
>> -------------
>>  http blah
>> (1 row)
>>
>> But it seems to me to be at least as plausible that you should get
>> nothing as that you should get a space for a removed tag.
>>
>> Comments?
>>
>> 			regards, tom lane
>>
>> --
>> Sent via pgsql-bugs mailing list (pgsql-bugs(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-bugs
>
>

 	Regards,
 		Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

pgsql-bugs by date

Next:From: Heikki LinnakangasDate: 2009-01-15 12:09:57
Subject: Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION
Previous:From: Tom LaneDate: 2009-01-15 01:56:28
Subject: Re: Fall back to alternative tsearch dictionary directory

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group