Re: BUG #5021: ts_parse doesn't recognize email addresses with underscores

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "Dan O'Hara" <danarasoftware(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5021: ts_parse doesn't recognize email addresses with underscores
Date: 2009-10-22 16:29:39
Message-ID: 603c8f070910220929i14dbcdcfw648e0c1a7ae19ef@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Fri, Aug 28, 2009 at 9:59 AM, Dan O'Hara <danarasoftware(at)gmail(dot)com> wrote:
>
> The following bug has been logged online:
>
> Bug reference:      5021
> Logged by:          Dan O'Hara
> Email address:      danarasoftware(at)gmail(dot)com
> PostgreSQL version: 8.3.7
> Operating system:   win32
> Description:        ts_parse doesn't recognize email addresses with
> underscores
> Details:
>
> In the following example,
>
> select distinct token as email
> from ts_parse('default', ' first_last(at)yahoo(dot)com '   )
> where tokid = 4
>
> ts_parse returns last(at)yahoo(dot)com rather than first_last(at)yahoo(dot)com  It seems
> that any text prior to the underscore is truncated.  If the portion
> following the underscore is only numeric, such as this example,
>
> select distinct token as email
> from ts_parse('default', ' bill_2000(at)yahoo(dot)com '   )
> where tokid = 4
>
> then ts_parse returns nothing at all.
>
> section 3.2.3 of RFC 5322 indicates that underscores are valid characters in
> an email address.
>
> http://tools.ietf.org/html/rfc5322

I don't think this has much to do with email addresses. If you do:

select token from ts_parse('a_b');

...you get three tokens. In your case you're pulling out the fourth
token, but some of your examples don't have four tokens, so then you
get nothing at all.

I'm not real familiar with ts_parse(), but I'm thinking that it
doesn't have any special casing for email addresses and is just
intended to parse text for full-text-search - in which case splitting
on _ is a pretty good algorithm.

...Robert

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Robert Haas 2009-10-22 16:30:18 Re: BUG #5039: 'i' flag i in regexp_replace ignored for polish letters
Previous Message Robert Haas 2009-10-22 16:24:47 Re: 答复: 答复: [BUGS] Encounter shared memory error when running createlang command!

Browse pgsql-hackers by date

  From Date Subject
Next Message David Jantzen 2009-10-22 16:34:42 Fwd: Reversing flow of WAL shipping
Previous Message João Eugenio Marynowski 2009-10-22 16:28:55 table corrupted