Skip site navigation (1) Skip section navigation (2)

Re: Problems with charsets, investigated...

From: Oliver Jowett <oliver(at)opencloud(dot)com>
To: alexandre(dot)aufrere(at)inet6(dot)fr
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Problems with charsets, investigated...
Date: 2004-08-08 02:29:15
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-jdbc
(I'd appreciate a cc: on list posts as the pgsql lists can be unreliable)

Alexandre Aufrere wrote:
> Ok, seems i was really really unable to explain my problem...
> 1) Database's encoding is set to LATIN1 (we have SQL_ASCII nowhere)
> 2) JDBC driver requests data to database in UNICODE (hard-coded in driver)
> 3) String coming from database therefore are UTF-8-encoded. And they are 
> correctly transcoded from LATIN1, as the encoding is correctly specified 
> in the pg_database for that database. 

This all sounds correct.

> 4) Java stores internally as UTF-16... but that's only the internal 
> representation. Now there seems to be a problem here (see description of 
> the work-around below). 

The internal representation is always UTF-16, yes -- you must transcode 
on output in general.

> 5) Java's file.encoding system property is set to ISO-8859-1 (because we 
> have other data coming from LDAP or filesystem, which are encoded in 
> ISO-8859-1 anyway) 
> 6) Our web app choses to display Java Strings accordingly to 
> file.encoding, therefore as ISO-8859-1 
> 7) Bing ! problem: we are now interpreting UTF8-encoded strings (see point 
> 2/3) as ISO-8859-1 
> Therefore all the accentuated characters go wrong !

This implies that your web app is not transcoding correctly from UTF-16 
(internal string representation) to ISO-8859-1.

How does your web app use file.encoding exactly? Note that the 
file.encoding property does *not* control the default encoding used by 
String.getBytes(), as I understand it; the default eencoding is 
JVM-controlled from the system's locale settings.

> In all previous versions of the JDBC driver (we started with the one 
> coming along with postgresql 7.0 series) coupled with the corresponding 
> version of postgresql, the data was correctly retrieved. 

I think this is luck of the draw more than anything..

> Now, a working work-around looks like:
> String correctString = new 
> String(stringFromJdbcDriver.getBytes("ISO-8859-1"),"UTF-8"); 

This doesn't make sense at all! This means you are interpreting 
ISO-8859-1 encoded bytes as UTF-8, which is nonsense.

> My patch eliminates the problem, because the JDBC driver gets ISO-8859-1 
> (aka LATIN1) strings from the server, therefore java internal transcoding 
> into UTF-16 goes ok... 

It's still the wrong thing to do! I'm sure there is another bug here 
that is causing the underlying problem. There should be no problem with 
converting from client_encoding = UNICODE to Java's UTF-16.

What driver version *exactly* are you using? It's possible that you've 
hit a driver bug of some sort that is fixed in the current driver 
(specifically, I think build 302 was broken wrt. UTF-8 conversions -- 
but it was only available briefly). Have you tried with the current 
development driver from

Can you show me the code your web app uses to display the Strings it 
gets from the driver in ISO-8859-1?

Can you dump out the *characters* of the problem Strings you get from 
the driver, one character at a time, and see what numeric values you're 
getting and whether they are the right UTF-16 values you expect? i.e.

  for (int i = 0; i < str.length(); ++i) {
   System.out.println(" offset " + i + " value " + (int)str.charAt(i));

Can you provide a pg_dump (LATIN1 encoding I assume) plus sample 
testcase that shows off the problem?


In response to


pgsql-jdbc by date

Next:From: Oliver JowettDate: 2004-08-08 02:44:58
Subject: Re: Problems with charsets, investigated...
Previous:From: Alexandre AufrereDate: 2004-08-07 08:41:25
Subject: Re: Problems with charsets, investigated...

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group