Re: Concerning about Unicode-aware string handling

From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Vincas Dargis *EXTERN*" <vindrg(at)gmail(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Concerning about Unicode-aware string handling
Date: 2012-05-21 13:38:03
Message-ID: D960CB61B694CF459DCFB4B0128514C207E6AAD2@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Vincas Dargis wrote:
> We have problems (currently using 8.4, but also in latest 9.1.3) in
> our application with Unicode word symbols in Lithuanian ('ąčęėįšųūž'),
> Russian and of course potentially other languages.
>
> For example, regex_replace('acząčž', E'\\W', '', 'g') removes ąčž.
>
> lower() and ~* comparison works only with locale that is set (no
> internationalization).
>
> Could we expect Unciode support in near future? Or should we do quick
> hacks by reimplementing regexp_replace(), lower(), upper() and other
> string SQL functions using, for example, Qt libraries..? Or maybe are
> there some kind simpler workarounds?

I tried it with 9.1.3 on Linux:

upper() and lower() works fine, no matter what the
database encoding is:

test=> SELECT upper('acząčž');
upper
--------
ACZĄČŽ
(1 row)

And this seems OK with LATIN7:

lt2=> SHOW server_encoding;
server_encoding
-----------------
LATIN7
(1 row)

lt2=> SHOW lc_ctype;
lc_ctype
----------
lt_LT
(1 row)

lt2=> SHOW lc_collate;
lc_collate
------------
lt_LT
(1 row)

lt2=> SELECT 'ą' ~* '\w';
?column?
----------
t
(1 row)

But it looks wrong with UTF8:

lt=> SHOW server_encoding;
server_encoding
-----------------
UTF8
(1 row)

lt=> SHOW lc_ctype;
lc_ctype
------------
lt_LT.utf8
(1 row)

lt=> SHOW lc_collate;
lc_collate
------------
lt_LT.utf8
(1 row)

lt=> SELECT 'ą' ~* '\w';
?column?
----------
f
(1 row)

Is that what you are complaining about?

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Samba 2012-05-21 13:55:42 Re: Global Named Prepared Statements
Previous Message Luca Ferrari 2012-05-21 13:35:55 help understanding the bitmap heap scan costs