Re: Funtion to clean up strings?

From: Andreas <maps(dot)on(at)gmx(dot)net>
To: PostgresSQL list <pgsql-sql(at)postgresql(dot)org>
Cc: raju(at)linux-delhi(dot)org
Subject: Re: Funtion to clean up strings?
Date: 2009-02-17 15:32:14
Message-ID: 499AD87E.4010206@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hi
Thanks, that really works :)

Now a last extension.
Some numbers were entered in a "110% perfect" way with an excessive (0).
+49 (0) 123 / 456 789
I have to suspect the source liked to express that it's either +49 or
0 if the +49 isn't applicable, but not both.
Both together are semantically wrong and your function results therefore
to "00123456789".
Correct was "0123456789" or e.g. "+33123456789" if it were an
international number.

This (0) should be silently dropped as long as the endresult has at
least one 0 or + like in the allready covered cases.

I tried to use this RegEx magic myself as far as I could figure it out,
yet and came up with replacing every p in your solution with another regex

case
when regexp_replace(p, E'[^0-9+]', '', 'g') ~ E'^(\\+|00)49'
then '0'||
regexp_replace(
regexp_replace(
regexp_replace(p, E'[^0-9+()]', '', 'g')
, '\\(0\\)||\\(||\\)', '', 'g')
, E'^(?:\\+|00)49(.*)', E'\\1')
when regexp_replace(p, E'[^0-9+]', '', 'g') ~ E'^(\\+|00)'
then '+'||
regexp_replace(
regexp_replace(
regexp_replace(p, E'[^0-9+()]', '', 'g')
, '\\(0\\)||\\(||\\)', '', 'g')
, E'^(?:\\+||00)(.*)', E'\\1')
else
regexp_replace(p, E'[^0-9]', '', 'g')
end

That would catch the leading spaces in " 00 49 ( 0 ) 1 2 3 456 -0", too.
Creating a sql-function thows a WARNING: nonstandard use of \\ in a
string literal
but it still works. Do you know a better or more correct way to reach
the same?

Perhaps one could find a way with less calls to regexp_replace ?

Regards
Andreas :)

Raj Mathur wrote:
> On Friday 13 Feb 2009, Andreas wrote:
>
>> now ... lets get more complicated.
>> Phone numbers are entered:
>> 0123/4567-89 national number
>> 0049/123/4567-89 the same number
>> +49/123/4567-89 still the same number
>>
>> should come out as 0123456789 to search in this column.
>> "0049" and "+49" --> 0
>>
>> while international numbers
>> +33/123456789
>> 0033/123456789
>>
>> should come as
>> +33123456789
>>
>
> TEST=> create table foo(p text);
>
> TEST=> insert into foo (select regexp_split_to_table('0123/4567-89
> 0049/123/4567-89 +49/123/4567-89 +33/123456789 0033/123456789',' '));
>
> TEST=> select * from foo;
> p
> ------------------
> 0123/4567-89
> 0049/123/4567-89
> +49/123/4567-89
> +33/123456789
> 0033/123456789
> (5 rows)
>
> TEST=> select
> (case
> when p ~ E'^(\\+|00)49'
> then '0'||regexp_replace(regexp_replace(p, E'[^0-9+]', '', 'g'),
> E'^(?:\\+|00)49(.*)', E'\\1')
> when p ~ E'^(\\+|00)'
> then '+'||regexp_replace(regexp_replace(p, E'[^0-9+]', '', 'g'),
> E'^(?:\\+||00)(.*)', E'\\1')
> else
> regexp_replace(p, E'[^0-9]', '', 'g')
> end)
> from foo;
> regexp_replace
> ----------------
> 0123456789
> 0123456789
> 0123456789
> +33123456789
> +33123456789
> (5 rows)
>
> That do what you want? (Apologies for the wrapped lines.)
>
> Regards,
>
> -- Raju
>

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Raj Mathur 2009-02-17 15:51:44 Re: Funtion to clean up strings?
Previous Message Richard Huxton 2009-02-17 15:30:44 Re: How to find number of seconds between 2 timestamps