Re: BUG #4200: Regexp character classes not UTF8-compliant

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Jean-Baptiste Quenot <jbq(at)caraldi(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4200: Regexp character classes not UTF8-compliant
Date: 2008-05-29 00:04:14
Message-ID: 200805290004.m4T04Eb15568@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


I am not sure how to help you except to say that UTF8 is a character set
encoding, while en_US.UTF-8 is more of an encoding with a locale. My
guess is that if you use *.UTF-8 where you specified the proper
localization language, it would work.

http://www.postgresql.org/docs/8.2/static/locale.html

---------------------------------------------------------------------------

Jean-Baptiste Quenot wrote:
>
> The following bug has been logged online:
>
> Bug reference: 4200
> Logged by: Jean-Baptiste Quenot
> Email address: jbq(at)caraldi(dot)com
> PostgreSQL version: 8.3.1
> Operating system: Linux Ubuntu Hardy
> Description: Regexp character classes not UTF8-compliant
> Details:
>
> PostgreSQL documentation at
> http://www.postgresql.org/docs/8.3/static/functions-matching.html describes
> the various character classes, and they can be used to match or replace
> strings with regexp support. However, the [:alnum:] and [:alpha:] character
> classes are not UTF8-compliant, like shown in the examples below:
>
> dockee=# show client_encoding;
> client_encoding
> -----------------
> UTF8
> (1 row)
>
> dockee=# show lc_ctype;
> lc_ctype
> -------------
> en_US.UTF-8
> (1 row)
>
> dockee=# select regexp_replace('bbu', '[[:alnum:]]', '', 'g');
> regexp_replace
> ----------------
>
> (1 row)
>
> ovhdev=# select regexp_replace('bbu', '[[:alpha:]]', '', 'g');
> regexp_replace
> ----------------
>
> (1 row)
>
> dockee=# select regexp_replace('bbu', $$\w$$, '', 'g');
> regexp_replace
> ----------------
>
> (1 row)
>
> Only characters in the ASCII range were correctly detected to belong to the
> [:alnum:] character class, whereas other characters are valid too.
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas H. 2008-05-29 00:50:50 Re: BUG #4186: set lc_messages does not work
Previous Message Tom Lane 2008-05-28 23:20:26 Re: BUG #4186: set lc_messages does not work