Re: Problem with accessing Russian UTF database

From: "Ronald Vyhmeister" <rvyhmeister(at)aiias(dot)edu>
To: "'Oliver Jowett'" <oliver(at)opencloud(dot)com>
Cc: "'Ronald Vyhmeister'" <rvyhmeister(at)gmail(dot)com>, <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: Problem with accessing Russian UTF database
Date: 2008-11-26 00:04:38
Message-ID: E765DB637B834673B2523839EE52E5BB@peregrino
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc


Ronald Vyhmeister wrote:

> Locale locale = Locale.getDefault();
> locale = new Locale("ru", "RU");

>The driver ignores locale so this won't actually be doing anything.

> SQL = "update sys_people set middle_name='фывфывафыва' where
> family_name='Pratt';";

>I wouldn't rely on your JSP implementation / java compiler interpreting
>that string literal in the way that you assume. I suggest you construct
>your string with \uNNNN unicode escapes to be sure you're really
>compiling what you think you're compiling. Your mail headers claims a
>charset of "koi8-r" but I don't know what the default file encoding for
>your target system is; perhaps it is using ISO-8859-1 or similar, which
>might result in the above being interpreted as the accented characters
>you see in PgAdmin?

Thank you! I think we're getting much closer to a solution... but here I'll need some help... I just added this line just after creating the statement:

out.print(SQL);

and this is the text I got:

update sys_people set middle_name='testing' where family_name='Smith';
update sys_people set middle_name='ôûâôûâàôûâà' where family_name='Pratt';

This matches the text I have in the DB perfectly... the question is now, how to get that to work right? As for the Unicode escapes, how do I determine them? And why is it that when I read the "garbage" back from the DB it shows perfect Russian characters again?

I've tried both jikes and the Sun javac compilers, as well as the gcj compiler, with JDK 1.6 EE (and the symptoms are the same since I was working with language en_US.UTF-8 and JDK 1.5). Supposedly they all take their parameters from the environment (which I believe is set right)...

For the server settings (Ubuntu 8.04 LTS),

LANG=ru_RU.UTF-8
LANGUAGE=ru
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC="ru_RU.UTF-8"
LC_TIME="ru_RU.UTF-8"
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES="ru_RU.UTF-8"
LC_PAPER="ru_RU.UTF-8"
LC_NAME="ru_RU.UTF-8"
LC_ADDRESS="ru_RU.UTF-8"
LC_TELEPHONE="ru_RU.UTF-8"
LC_MEASUREMENT="ru_RU.UTF-8"
LC_IDENTIFICATION="ru_RU.UTF-8"
LC_ALL=

>Also, as I suggested earlier, try examining your strings
>character-by-character to check that they really contain the codepoints
>you think they contain.

Right now, the string I'm entering was from the keyboard, set to Russian mode (and yes, I've tried it from Linux and Windows, and the results are the same).

Thanks again for all the help,

Ron

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Oliver Jowett 2008-11-26 00:21:58 Re: Problem with accessing Russian UTF database
Previous Message Oliver Jowett 2008-11-25 23:37:37 Re: Problem with accessing Russian UTF database