Re: BUG #5021: ts_parse doesn't recognize email addresses with underscores

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dan O'Hara <danarasoftware(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5021: ts_parse doesn't recognize email addresses with underscores
Date: 2009-10-23 01:44:30
Message-ID: 20091023014430.GC2240@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Euler Taveira de Oliveira escribió:
> Robert Haas escreveu:
> > I'm not real familiar with ts_parse(), but I'm thinking that it
> > doesn't have any special casing for email addresses and is just
> > intended to parse text for full-text-search - in which case splitting
> > on _ is a pretty good algorithm.
> >
> It is a bug. The tsearch claims to identify types of tokens but it doesn't
> correctly identify any valid e-mail addresses. As Dan stated ts_parse() fails
> to recognize an e-mail address. For example, foo+bar(at)baz(dot)com is a valid e-mail
> but the function fails to report that.

It is similarly too-simplistic for other cases too, like file names
(particularly where Windows filenames are concerned).

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Kamil Roman 2009-10-23 08:13:54 Re: BUG #5039: 'i' flag i in regexp_replace ignored for polish letters
Previous Message Tom Lane 2009-10-22 23:11:01 Re: BUG #5126: convert_to preventing index scan

Browse pgsql-hackers by date

  From Date Subject
Next Message João Eugenio Marynowski 2009-10-23 02:34:03 Re: table corrupted
Previous Message Alvaro Herrera 2009-10-23 01:38:06 Re: per table random-page-cost?