Re: Scadinavian characters in regular expressions

From: Søren Vainio <sva(at)Netpointers(dot)com>
To: 'Andreas Joseph Krogh' <andreak(at)officenet(dot)no>, "'pgsql-sql(at)postgresql(dot)org'" <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Scadinavian characters in regular expressions
Date: 2002-04-09 11:28:38
Message-ID: 910513A5A944D5118BE900C04F67CB5A1F82C6@MAIL
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Using \s does produce FALSE for SELECT 'oneå two three' ~
'^[^\s]+[\s][^\s]+$';
But it also produces FALSE for any two-word string ex:
SELECT 'one two' ~ '^[^\s]+[\s][^\s]+$'; where I would expect TRUE???
(I am using PostgreSQL 7.1.3)

> -----Oprindelig meddelelse-----
> Fra: pgsql-sql-owner(at)postgresql(dot)org
> [mailto:pgsql-sql-owner(at)postgresql(dot)org]På vegne af Andreas
> Joseph Krogh
> Sendt: 9. april 2002 11:53
> Til: 'pgsql-sql(at)postgresql(dot)org'
> Emne: Re: [SQL] Scadinavian characters in regular expressions
>
>
> On Tuesday 09 April 2002 10:51, Søren Vainio wrote:
> > Can someone please explain the following?
> > I am using a regular expression to find strings containing
> two words (begin
> > with one or more characters not being spaces followed by a
> space followed
> > by one or more characters not being spaces).
> > But when scandinavian characters are included it returns
> different results
> > depending on where the character is positioned.
> > The first two-word example returns TRUE as expected.
> > The second three-word example returns FALSE as expected.
> > But when I let an å (&#229 &aring a-ring) traverse through
> the string it
> > unexpectedly returns TRUE when the character is positioned as the
> > second-last or last character in the two first words.
> >
> > SELECT 'one two' ~ '^[^ ]+[ ][^ ]+$'; returns TRUE
> > SELECT 'one two three' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'åone two three' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'oåne two three' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'onåe two three' ~ '^[^ ]+[ ][^ ]+$'; returns TRUE
> > SELECT 'oneå two three' ~ '^[^ ]+[ ][^ ]+$'; returns TRUE
> > SELECT 'one åtwo three' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'one tåwo three' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'one twåo three' ~ '^[^ ]+[ ][^ ]+$'; returns TRUE
> > SELECT 'one twoå three' ~ '^[^ ]+[ ][^ ]+$'; returns TRUE
> > SELECT 'one two åthree' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'one two tåhree' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'one two thåree' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'one two thråee' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'one two threåe' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'one two threeå' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> >
> > Thank you for any response.
> >
> > Søren Vainio, Denmark
>
> I just tried the following which returned false as expected:
> andreak=# SELECT 'oneå two three' ~ '^[^\s]+[\s][^\s]+$';
> ?column?
> ----------
> f
> (1 row)
>
> andreak=# select version();
> version
> -----------------------------------------------------------
> PostgreSQL 7.2 on i686-pc-linux-gnu, compiled by GCC 2.96
> (1 row)
>
> NOTE: I replaced your [^ ] with the properly formated pattarn
> for whitespace:
> [^\s]
>
> --
> Andreas Joseph Krogh (Senior Software Developer)
> <andreak(at)officenet(dot)no>
> A hen is an egg's way of making another egg.
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Andreas Joseph Krogh 2002-04-09 11:49:36 Re: Scadinavian characters in regular expressions
Previous Message Michael Contzen 2002-04-09 10:15:47 Re: How slow is DISTINCT?