Re: Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, "Dan O'Hara" <danarasoftware(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores
Date: 2010-03-13 01:18:36
Message-ID: 20643.1268443116@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Well, I think the big question is whether we need to honor RFC 5322
> (http://www.rfc-editor.org/rfc/rfc5322.txt). Wikipedia says these are
> all valid characters:

> http://en.wikipedia.org/wiki/E-mail_address

> * Uppercase and lowercase English letters (a-z, A-Z)
> * Digits 0 to 9
> * Characters ! # $ % & ' * + - / = ? ^ _ ` { | } ~
> * Character . (dot, period, full stop) provided that it is not the
> first or last character, and provided also that it does not appear two
> or more times consecutively.

That's an awful lot of special characters. For the RFC's purposes,
it's not hard to be flexible because in an email message there is
external context telling where to expect an address. I think if we
tried to allow all of those in email addresses in tsearch, we'd have
"email addresses" gobbling up a whole lot of adjacent text, to nobody's
benefit.

I can see the case for adding "+" because that's fairly common as Alvaro
notes, but I think we should be very circumspect about going farther.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Bruce Momjian 2010-03-13 01:36:55 Re: Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores
Previous Message Bruce Momjian 2010-03-13 01:09:32 Re: Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2010-03-13 01:36:55 Re: Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores
Previous Message Bruce Momjian 2010-03-13 01:09:32 Re: Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores