Skip site navigation (1) Skip section navigation (2)

Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text fields

From: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
To: Johann Zuschlag <zuschlag2(at)online(dot)de>
Cc: Dave Page <dpage(at)vale-housing(dot)co(dot)uk>, pgsql-odbc(at)postgresql(dot)org
Subject: Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text fields
Date: 2006-03-30 21:35:12
Message-ID: 442C4F10.3090004@tpf.co.jp (view raw or flat)
Thread:
Lists: pgsql-odbc
Johann Zuschlag wrote:

> Dave Page schrieb:
>
>> If 'ö' is 'ö', then isn't the query above mixing single and a 
>> multibyte encoding? Ie. It should all be single byte - e.g.
>>
>> select name from kunde where name >= 'ö' order by name asc;
>>
>> Or all multibyte (displayed byte by byte) whatever that results in:
>>
>> s*e*l*e*c*t* *n*a*m*e* *f*r*o*m* *k*u*n*d*e* *w*h*e*r*e* *n*a*m*e* 
>> *>*=* *'*ö'*;*
>>
>> Of course, we all know how well I grok encoding issues :-)
>>   
>
> Hi Dave,
>
> I can understand you. This encoding issues drive me also crazy some 
> times. :-)
>
> The problem with UTF-8 is that all ASCII characters are represented by 
> one byte and all non ASCII characters, e.g. German Umlauts, are 
> represented by two bytes. That's why UTF-8 is called a 
> "variable-length multibyte encoding". In a pure Unicode world, e.g. 
> U+xxxx with two bytes, every character is represented by two bytes 
> (fixed-length multibyte encoding). So Unicode is not equal to UTF-8, 
> even though the PostgreSQL documentation is stating that.
>
> If you like, see: http://www.utf8-chartable.de/ or some explanation at 
> http://czyborra.com/utf/
>
> Windows XP supports ANSI, UTF-8, Unicode and Unicode Big Endian. 
> Unfortunately (or fortunately?) Windows seems to use UTF-8 for 
> European languages. Hiroshi can you explain that? I guess the Japanese 
> edition of Windows XP is using pure 2 byte Unicode.


Unicode ODBC drivers handle UCS-2 not UTF-8 even in European environemt. 
Unfortunately PostgreSQL doesn't handle UCS-2
directly(because it could contain NULL bytes in the string), the unicode 
driver sets the client_encoding to UTF-8 automatically and
converts from UCS-2 data to UTF-8 data which  the PostgreSQL  backend 
can understands when sending  queries. So what you
can see in the backend log  is UTF-8. Then the backend converts from 
UTF-8 data to the server encoding data. After all,  the locale
(especially LC_COLLATE) setting you need is the one which matches the 
backend encoding.

>
> I can't say anything about psql. But the new  psqlodbc driver 7.03.26X 
> seems to handle that situation very well.
>
> So I suppose the test was valid to a certain extend, 


Yes thanks. I can't test the LATINxx encoding by myself.

regards,
Hiroshi Inoue

In response to

Responses

pgsql-odbc by date

Next:From: Bart SamwelDate: 2006-03-30 21:36:44
Subject: Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text
Previous:From: Dave PageDate: 2006-03-30 19:45:43
Subject: Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text fields

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group