Re: Problem with accessing Russian UTF database

From: Oliver Jowett <oliver(at)opencloud(dot)com>
To: Ronald Vyhmeister <rvyhmeister(at)gmail(dot)com>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Problem with accessing Russian UTF database
Date: 2008-11-25 23:09:54
Message-ID: 492C85C2.8040602@opencloud.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Ronald Vyhmeister wrote:
> I'm having real trouble with the jdbc driver for postgres... I just
> installed the latest version...
>
> I have a database, UTF8 encoded, which has data in Russian. I can view it
> beautifully using PGAdmin3 or any other ODBC connection.

Perhaps these connections are not actually using UTF8 to interpret the
data, but some other encoding - so while they appear to write encoded
data then retrieve it OK, it's not actually what you think it is when
interpreted as UTF8?

> String URLdb =
> "jdbc:postgresql://127.0.0.1:5432/oldzautest?user=noe&password=genesis&charS
> et=UNICODE";

You should not need "charSet=UNICODE", though I don't think it'll break
anything.

> <data>
> <db_content>
> <row>
> <contents content = "1" />
> <contents content = "1" />
> <contents content = "?????" />
> <contents content = "????????" />
> <contents content = "?????????" />
> <contents content = "1965-03-10" />
> <contents content = "1" />
> </row>
> </db_content>
> </data>

Perhaps the problem is in the encoding you are using to write out that
XML fragment? Or in whatever tool you are using to view it?

> I've set the client_encoding to UTF8 on the server... What am I doing
> wrong? What am I missing? I'd be thrilled to interact privately with
> someone who has solved what for now is a mystery to me.

You shouldn't need to touch client_encoding for JDBC to work (though
other clients might need it). The JDBC driver forces client_encoding to
UTF8 anyway on connection startup.

It may be useful to examine the actual value of the characters in the
String objects you are dealing with (i.e. print out (int)s.charAt(0)
etc) to check they contain the unicode codepoints you were expecting.

In general the driver "just works" with UTF-8 encoded databases. It's
dealing in terms of Unicode strings internally, so the only transcoding
that goes on is from UTF-8 to UTF-16, which is lossless. All the
reported problems we've seen in the (recent) past with this
configuration have been either problems with non-JDBC clients getting
confused, or problems with how the resulting String was displayed to the
user, or having non-unicode garbage stored in the database in the first
place.

-O

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Oliver Jowett 2008-11-25 23:37:37 Re: Problem with accessing Russian UTF database
Previous Message Ronald Vyhmeister 2008-11-25 21:26:57 Problem with accessing Russian UTF database