From: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
---|---|
To: | Euler Taveira de Oliveira <euler(at)timbira(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Dan O'Hara <danarasoftware(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: BUG #5021: ts_parse doesn't recognize email addresses with underscores |
Date: | 2009-10-23 01:44:30 |
Message-ID: | 20091023014430.GC2240@alvh.no-ip.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
Euler Taveira de Oliveira escribió:
> Robert Haas escreveu:
> > I'm not real familiar with ts_parse(), but I'm thinking that it
> > doesn't have any special casing for email addresses and is just
> > intended to parse text for full-text-search - in which case splitting
> > on _ is a pretty good algorithm.
> >
> It is a bug. The tsearch claims to identify types of tokens but it doesn't
> correctly identify any valid e-mail addresses. As Dan stated ts_parse() fails
> to recognize an e-mail address. For example, foo+bar(at)baz(dot)com is a valid e-mail
> but the function fails to report that.
It is similarly too-simplistic for other cases too, like file names
(particularly where Windows filenames are concerned).
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
From | Date | Subject | |
---|---|---|---|
Next Message | Kamil Roman | 2009-10-23 08:13:54 | Re: BUG #5039: 'i' flag i in regexp_replace ignored for polish letters |
Previous Message | Tom Lane | 2009-10-22 23:11:01 | Re: BUG #5126: convert_to preventing index scan |
From | Date | Subject | |
---|---|---|---|
Next Message | João Eugenio Marynowski | 2009-10-23 02:34:03 | Re: table corrupted |
Previous Message | Alvaro Herrera | 2009-10-23 01:38:06 | Re: per table random-page-cost? |