Re: Concerning about Unicode-aware string handling

From: Vincas Dargis <vindrg(at)gmail(dot)com>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Concerning about Unicode-aware string handling
Date: 2012-05-21 14:02:47
Message-ID: CAPNCXk0zzHBEgrMXfLCV6WpB3qVhun9xOTe4sRe4ogUaspYSBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I've forgot to mention I'm working on Windows XP SP3

Yes, we are using UTF8 encoding and regexp works wrong. It looks like
you replicated that.

2012/5/21 Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>:
>
> I tried it with 9.1.3 on Linux:
>
> upper() and lower() works fine, no matter what the
> database encoding is:
>
> test=> SELECT upper('acząčž');
>  upper
> --------
>  ACZĄČŽ
> (1 row)
>
> And this seems OK with LATIN7:
>
> lt2=> SHOW server_encoding;
>  server_encoding
> -----------------
>  LATIN7
> (1 row)
>
> lt2=> SHOW lc_ctype;
>  lc_ctype
> ----------
>  lt_LT
> (1 row)
>
> lt2=> SHOW lc_collate;
>  lc_collate
> ------------
>  lt_LT
> (1 row)
>
> lt2=> SELECT 'ą' ~* '\w';
>  ?column?
> ----------
>  t
> (1 row)
>
> But it looks wrong with UTF8:
>
> lt=> SHOW server_encoding;
>  server_encoding
> -----------------
>  UTF8
> (1 row)
>
> lt=> SHOW lc_ctype;
>  lc_ctype
> ------------
>  lt_LT.utf8
> (1 row)
>
> lt=> SHOW lc_collate;
>  lc_collate
> ------------
>  lt_LT.utf8
> (1 row)
>
> lt=> SELECT 'ą' ~* '\w';
>  ?column?
> ----------
>  f
> (1 row)
>
>
> Is that what you are complaining about?
>
> Yours,
> Laurenz Albe

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2012-05-21 14:04:15 Re: Concerning about Unicode-aware string handling
Previous Message Samba 2012-05-21 13:55:42 Re: Global Named Prepared Statements