Re: Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, "Dan O'Hara" <danarasoftware(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores
Date: 2010-03-13 01:09:32
Message-ID: 201003130109.o2D19WG28274@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Alvaro Herrera wrote:
>
> Upon seeing this patch I considered that I use addresses such as
> alvherre+stuff(at)something(dot)org and wondered how could this thing support
> that. I don't think we want extra parser stuff just to add whatever
> random junk we want to support in email addresses ...

Well, I think the big question is whether we need to honor RFC 5322
(http://www.rfc-editor.org/rfc/rfc5322.txt). Wikipedia says these are
all valid characters:

http://en.wikipedia.org/wiki/E-mail_address

* Uppercase and lowercase English letters (a-z, A-Z)
* Digits 0 to 9
* Characters ! # $ % & ' * + - / = ? ^ _ ` { | } ~
* Character . (dot, period, full stop) provided that it is not the
first or last character, and provided also that it does not appear two
or more times consecutively.

And we don't currently honor most of the special characters, including
plus:

test=> select ts_parse('default', ' first+last(at)yahoo(dot)com ' );
ts_parse
--------------------
(12," ")
(1,first)
(12,+)
(4,last(at)yahoo(dot)com)
(12," ")
(5 rows)

Where does this leave us? Do we add the other characters? Do we
document that we only allow a limited number of characters for email
addresses? What is the logic in that? Do any of these characters
conflict with our tsquery operators?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2010-03-13 01:18:36 Re: Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores
Previous Message Alvaro Herrera 2010-03-13 00:55:51 Re: Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-03-13 01:18:36 Re: Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores
Previous Message Robert Haas 2010-03-13 01:08:23 Re: renameatt() can rename attribute of index, sequence, ...