Re: Problems with charsets, investigated...

From: Alexandre Aufrere <alexandre(dot)aufrere(at)inet6(dot)fr>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Problems with charsets, investigated...
Date: 2004-08-06 18:32:08
Message-ID: 20040806183208.75F47400E5@smtp.ies.inet6.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Well, no, actually i want to use LATIN1/ISO-8859-1 everywhere.
So my appserver should get ISO-8859-1 string from the driver, and not
UTF-8.
Why ? because we have a lot bunch of apps developped in ISO-8859-1, and as
well a lot of data in LATIN1, and it's out of question to put everything
in UTF-8/UNICODE.

For me, the driver should get strings encoded accordingly to the system
properties of the JVM it is run in. Or at least there should be a way to
tell the driver what charset to use. In other means, the current behaviour
is precisely NOT transparent to me, because i end up with a database in
LATIN1, whose data are converted in UTF-8 before i retrieve them from the
JDBC driver, which 1) would give me more work to convert back to
ISO-8859-1, and 2) would not be backward compatible (meaning have to test
again a LOT of apps to check we're breaking nothing).

So my hack just gets the file.encoding java system property, and requests
data to the postgresql server and handle it accordingly (namely if
file.encoding is ISO-8859-1, it requests LATIN1, and handles everything it
gets in ISO-8859-1).
Now, IMHO, ideally, the default behaviour of the JDBC driver should be to
get the encoding from pg_database table, and deduce what encoding to use
for the strings. And of course, there should be an easy way to change that
for people who want it other way.

I don't know how exactly it was working in previous versions, the fact is
that with LANG environment variables set everywhere to en_US.ISO-8859-1
and encoding in pg_database set to 8 (LATIN1), it just worked (we are
using postgresql+java+Enhydra for a long long time). Any change in that
that would involve us having to handle the charsets explicitly might be
"ideally" right, but is not backward compatible and will cause us a lot of
problems (and i'm quite sure not only to us).

Lastly, it's highly possible that i didn't see something somewhere, so i
apologize in advance for being utterly dumb ;-)

Regards,

Alexandre Aufrere

----------------------------------------------------
De : Kris Jurka <books(at)ejurka(dot)com>
A : Alexandre Aufrere <alexandre(dot)aufrere(at)inet6(dot)fr>
Objet : Re: [JDBC] Problems with charsets, investigated...
Date : Fri, 6 Aug 2004 11:05:54 -0500 (EST)
>
>
> On Fri, 6 Aug 2004, Alexandre Aufrere wrote:
>
> > Java correctly sets its file.encoding property to the charset
specified
> > in the LANG environment variable. However, it appears that whatever i
> > set this variable to, the JDBC driver seems to use UTF-8.
> >
>
> I'm not sure what problem or issue you think this is addressing, but it
is
> not something we want to do. The driver communicates with the server
> using UTF-8, so you should not be adjusting this and it is entirely
> transparent to the user. What you do after retrieving data is your
> business and you are welcome to save it or display it in any encoding
you
> desire, but the driver wants to communicate with the server using UTF-8.
>
> Kris Jurka
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

In response to

Browse pgsql-jdbc by date

  From Date Subject
Next Message Dave Cramer 2004-08-06 19:21:41 Re: Problems with big tables.
Previous Message Jose Miguel Madinaveitia Ramirez 2004-08-06 17:53:49 Problems with big tables.