Skip site navigation (1) Skip section navigation (2)

Re: BUG #4562: ts_headline() adds space when parsing url

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Denis Monsieur <dmonsieur(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: BUG #4562: ts_headline() adds space when parsing url
Date: 2009-01-15 01:38:31
Message-ID: 200901150138.n0F1cVf11755@momjian.us (view raw or flat)
Thread:
Lists: pgsql-bugs
This bug still exists in my testing.

---------------------------------------------------------------------------

Tom Lane wrote:
> "Denis Monsieur" <dmonsieur(at)gmail(dot)com> writes:
> > The problem is a space being added to text in the form of
> > http://some.url/path
> > Compare the output:
> 
> > shs=# SELECT ts_headline('http://some.url', to_tsquery('sometext'));
> >    ts_headline
> > -----------------
> >  http://some.url
> > (1 row)
> 
> > shs=# SELECT ts_headline('http://some.url/path', to_tsquery('sometext'));
> >       ts_headline
> > -----------------------
> >  http:// some.url/path
> > (1 row)
> 
> I looked into this, and it seems that the problem is that
> generateHeadline() emits a space for any token marked as replace = 1.
> I think it probably shouldn't emit anything at all.  AFAICS the cases
> where replace will get set are token types URL, TAG, NUMHWORD,
> ASCIIHWORD, HWORD.  For URL and the HWORD variants the space is
> certainly undesirable, because these token types are just respecifying
> text that is also covered by their component tokens.  The only case
> where you could make an argument that the space is useful is TAG,
> as in
> 
> regression=# SELECT ts_headline('http<foo>blah', to_tsquery('sometext'));
>  ts_headline 
> -------------
>  http blah
> (1 row)
> 
> But it seems to me to be at least as plausible that you should get
> nothing as that you should get a space for a removed tag.
> 
> Comments?
> 
> 			regards, tom lane
> 
> -- 
> Sent via pgsql-bugs mailing list (pgsql-bugs(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs

-- 
  Bruce Momjian  <bruce(at)momjian(dot)us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

In response to

Responses

pgsql-bugs by date

Next:From: Bruce MomjianDate: 2009-01-15 01:50:31
Subject: Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION
Previous:From: Valentine GogichashviliDate: 2009-01-14 11:18:03
Subject: Re: BUG #4613: intarray_del_elem returns an invalid empty array (for nullif comparison)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group