Charset encoding and accents

From: Davide Romanini <romaz(at)libero(dot)it>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Charset encoding and accents
Date: 2003-04-10 09:04:37
Message-ID: 3E9533A5.8070805@libero.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-jdbc

Hi,

I've posted this problem two times in the pgsql-jdbc user list, but no
one helped me to solve it. I think this is a really serious problem in
the jdbc driver. I've tried different solutions with no result.

Well, let me explain the problem. I've a currently working database in
PostgreSQL. There's an application, written in M$ Access, that uses the
database through the ODBC driver with no problems. I'd want to access
the data using a Swing application through the jdbc driver.
At server side the charset encoding is set as SQL_ASCII. It is not a
problem because all the strings containing accented characters are
retrived correctly by ODBC and also the psql client.
But if I retrive strings containing accents (like àòù) using jdbc I get
in trouble because my accents get dirty. For example: the string 'La
città di Forlì' is retrived and displayed as 'La citt?di Forl?'!

I've worked a bit around the problem with the source code of the driver.
I notice that when I call rs.getString(), the driver invokes (at a
certain point) the method org.postgresql.core.Encoding.decode(byte[]
encodedString, int offset, int length).
This method calls the decodeUTF8 when the actual encoding equals to
"UTF-8". If the encoding is different, it simply returns a new
String(encodedString, offset, length, encoding).
Well, my database is SQL_ASCII, so the jdbc driver should return a new
string and not call decodeUTF8. But when I do a step by step debug into
the source, the encoding ALWAYS equals to UTF-8! I've also tried to set
a parameter in my connection string:
jdbc:postgresql://localhost/prova?charSet=SQL_ASCII (I've tried a lot of
different encodings here). The encoding is always UTF-8.
Well, I thought 'if the driver wants strings to be UNICODE, set up the
server variable CLIENT_ENCODING to UNICODE'. No result! It doesn't change!
The only way to have my string displayed correctly is to comment out all
the decodeUTF8 and take it return a new String(data). So I think that if
the encoding is correctly recognized to be different from UTF-8 the
decode method will return the new String that is the correct behaviour
in my case.

Please don't answer me to change my database to UNICODE. I cannot do
that. And I do not WANT to do that. Why the ODBC driver works fine and
the JDBC driver works only with UNICODE databases?? It's a bug and
should be corrected. If I was skilled enough I corrected the bug myself
but I don't know much about JDBC standard.

I hope you answer to me with a solution. Really, the driver is simply
unusable for serious work with this bug.

The problem is not solved with the latest stable (version 7.3 build 109)
and development (version 7.4 build 204) release of the driver.

Regards, Romaz
--
Davide Romanini

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Galbavy 2003-04-10 09:07:32 Re: More thoughts about FE/BE protocol
Previous Message Sailesh Krishnamurthy 2003-04-10 08:53:36 Re: Bit Filters

Browse pgsql-jdbc by date

  From Date Subject
Next Message Iran 2003-04-10 11:26:19 RES: Problems retrieving data from bytea field
Previous Message kangch 2003-04-10 06:01:53 DriverManager.setLoginTimeout question