Bug with Tsearch and tsvector

From: "Donald Fraser" <postgres(at)kiwi-fraser(dot)net>
To: "[BUGS]" <pgsql-bugs(at)postgresql(dot)org>
Subject: Bug with Tsearch and tsvector
Date: 2010-04-26 13:51:35
Message-ID: E7CE594F0C6149D48DA8D0D9937A4915@DEVELOP1
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

PostgreSQL 8.3.10 (on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46))
OS: Linux Redhat EL 5.4
Database encoding: LATIN9

Using the default tsearch configuration, for 'english', text is being wrongly parsed into the tsvector type.
The fail condition is shown with the following example, using the ts_headline function to highlight the issue.

SELECT ts_headline('english', 'The annual financial report will shortly be posted on the Company&#8217;s web-site at
<span lang="EN-GB">http://www.harewoodsolutions.co.uk/press.aspx</span><span lang="EN-GB" style=""></span><span style="">
and a further announcement will be made once the annual financial report is available to be downloaded. </span>',
to_tsquery(''), 'MaxWords=101, MinWords=100');

Output:
"The annual financial report will shortly be posted on the Company&#8217;s web-site at
http://www.harewoodsolutions.co.uk/press.aspx</span><span lang="EN-GB" style="">
and a further announcement will be made once the annual financial report is available to be downloaded. "

Expected output:
"The annual financial report will shortly be posted on the Company&#8217;s web-site at
http://www.harewoodsolutions.co.uk/press.aspx
and a further announcement will be made once the annual financial report is available to be downloaded. "

Regards
Donald Fraser

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Scott Mead 2010-04-26 14:02:06 Re: pgadmin supports on SLES10.3
Previous Message Christoph Zwerschke 2010-04-26 12:00:29 Re: BUG #5438: Bug/quirk in ascii() function