Re: Regexps vs. locale

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Regexps vs. locale
Date: 2009-01-07 04:44:24
Message-ID: 200901070444.n074iOM19932@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Added to TODO:

Add ability to use case-insensitive regular expressions on multi-byte
characters

ILIKE already works with multi-byte characters

* http://archives.postgresql.org/pgsql-hackers/2008-12/msg00433.php

---------------------------------------------------------------------------

Andrew Gierth wrote:
> This came up on irc:
>
> postgres=# show lc_ctype;
> lc_ctype
> -------------
> fr_FR.UTF-8
>
> postgres=# show server_encoding;
> server_encoding
> -----------------
> UTF8
> (1 row)
>
> postgres=# select E'\303\201' ILIKE E'\303\241';
> ?column?
> ----------
> t
> (1 row)
>
> postgres=# select E'\303\201' ~* E'\303\241';
> ?column?
> ----------
> f
> (1 row)
>
> Obviously, this happens because the locale support functions in
> backend/regex/regc_locale.c are (presumably intentionally) crippled so
> as not to support non-ascii chars, despite all the code there using
> wide chars for everything otherwise.
>
> Why is this? It does not appear to be a documented restriction.
>
> --
> Andrew (irc:RhodiumToad)
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2009-01-07 04:47:14 Re: Multiplexing SUGUSR1
Previous Message Bruce Momjian 2009-01-07 04:25:54 Re: log output of vxid