Re: Select all invalid e-mail addresses

From: Michael Fuhr <mike(at)fuhr(dot)org>
To: Andrus <eetasoft(at)online(dot)ee>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Select all invalid e-mail addresses
Date: 2005-10-21 18:49:09
Message-ID: 20051021184909.GA59288@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Oct 21, 2005 at 08:15:23PM +0300, Andrus wrote:
> I tried
>
> SELECT email FROM customer
> WHERE email !~
> '/^[^(at)]*@(?:[^(at)]*\(dot))?[a-z0-9-_]+\.(?:a[defgilmnoqrstuwz]|b[abdefghijmnorstvwyz]|c[acdfghiklmnoruvxyz]|d[ejkmoz]|e[ceghrst]|f[ijkmorx]|g[abdefhilmnpqrstuwy]|h[kmnrtu]|i[delnoqrst]|j[mop]|k[eghimnprwyz]|l[abcikrstuvy]|m[acdghklmnopqrstuvwxyz]|n[acefgilopruz]|om|p[aefghklmnrtwy]|qa|r[eouw]|s[abcdeghijklmnortvyz]|t[cdfghjkmnoprtvwz]|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]|edu|com|net|org|gov|mil|info|biz|coop|museum|aero|name|pro)$/'
>
> but got an error
>
> ERROR: invalid regular expression: invalid character range

Aside from the fact that this regular expression is semantically wrong,
it has a few other problems:

* A hyphen (-) must come first or last in a character class if you want
it interpreted literally instead of as part of a range specification.

test=> SELECT 'abc' ~ '[a-z0-9-_]'; -- WRONG
ERROR: invalid regular expression: invalid character range

test=> SELECT 'abc' ~ '[a-z0-9_-]';
?column?
----------
t
(1 row)

* Regular expressions in PostgreSQL don't use delimiters like / at
the beginning and end of the expression.

test=> SELECT 'abc' ~ '/abc/'; -- WRONG
?column?
----------
f
(1 row)

test=> SELECT 'abc' ~ 'abc';
?column?
----------
t
(1 row)

* If you use single quotes around the regular expression then you
need to escape backslashes that should be part of the regular
expression; otherwise the backslash will be parsed by the string
parser before the string is used as a regular expression and you'll
get unexpected results. In other words, there's an extra layer of
string parsing that you have to allow for. In 8.0 and later you
can avoid this by using dollar quotes.

test=> SELECT 'abc' ~ 'a\.c'; -- WRONG
?column?
----------
t
(1 row)

test=> SELECT 'abc' ~ 'a\\.c';
?column?
----------
f
(1 row)

test=> SELECT 'a.c' ~ 'a\\.c';
?column?
----------
t
(1 row)

test=> SELECT 'abc' ~ $$a\.c$$;
?column?
----------
f
(1 row)

test=> SELECT 'a.c' ~ $$a\.c$$;
?column?
----------
t
(1 row)

--
Michael Fuhr

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Steve Atkins 2005-10-21 18:49:54 Re: Select all invalid e-mail addresses
Previous Message Scott Marlowe 2005-10-21 18:38:13 Re: scheduled backup