Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: "Dan O'Hara" <danarasoftware(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores
Date: 2010-03-13 00:48:24
Message-ID: 201003130048.o2D0mOP16522@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Teodor Sigaev wrote:
> > Oleg, Teodor, can you look at this? I tried to fix it in wparser_def.c,
> > but couldn't figure out how. Thanks.
> >>
> >> select distinct token as email
> >> from ts_parse('default', ' first_last(at)yahoo(dot)com ' )
> >> where tokid = 4
>
> Patch in attachment, it allows underscore in the middle of local part of email
> in in host name (similarly to '-' character).

Thanks, patch applied.

> I'm not sure about backpatching, because it could break existing search
> configuration.

Agreed. I don't think this warrants backpatching.

Here is the before behavior:

test=> select ts_parse('default', ' first_last(at)yahoo(dot)com ' );
ts_parse
--------------------
(12," ")
(1,first)
(12,_)
--> (4,last(at)yahoo(dot)com)
(12," ")
(5 rows)

and the after-patch, fixed behavior:

test=> select ts_parse('default', ' first_last(at)yahoo(dot)com ' );
ts_parse
--------------------------
(12," ")
--> (4,first_last(at)yahoo(dot)com)
(12," ")
(3 rows)

I assume because this only expands the pattern space for email addresses
that there is no affect on binary upgrades with this patch. Is that
correct? Would an email address check on a binary-upgraded tsvector
index not match an email address with underscores? Do we need a warning
in the release notes about this?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alvaro Herrera 2010-03-13 00:55:51 Re: Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores
Previous Message Wojciech Scigala 2010-03-12 23:37:02 BUG #5374: NULLed SERIAL improperly dumped

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2010-03-13 00:55:51 Re: Re: [BUGS] BUG #5021: ts_parse doesn't recognize email addresses with underscores
Previous Message Tatsuo Ishii 2010-03-12 23:39:24 Re: Reposnse from backend when wrong user/database request send