Re: Character Encoding problem

From: "antony baxter" <antony(dot)baxter(at)gmail(dot)com>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Character Encoding problem
Date: 2008-04-07 03:34:01
Message-ID: 3ee066b40804062034w338d5320s11df94cd126ab60e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

One thing I forgot to add; I also tried e.g.:

ps.setString(1, new
String(Charset.forName("UTF-8").encode(myString).array(), "UTF-8"));

to be absolutely certain that I was passing UTF-8 to the database; this threw a

22047 [Thread-2] DEBUG com.test.database.postgresql.Dao - PSQL
Exception State: 22021
22047 [Thread-2] DEBUG com.test.database.postgresql.Dao - PSQL
Exception Message: invalid byte sequence for encoding "UTF8": 0x00
22051 [Thread-2] ERROR com.test.database.postgresql.Dao - Error Storing Data:
org.postgresql.util.PSQLException: ERROR: invalid byte sequence for
encoding "UTF8": 0x00
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1592)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1327)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:192)
at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:451)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:350)
at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:343)
at com.test.database.postgresql.Dao.store(Dao.java:197)
...

I presume that this is because the JDBC driver is expecting the JVM's
internal UTF16 String representation?

Ant

On Mon, Apr 7, 2008 at 8:29 AM, antony baxter <antony(dot)baxter(at)gmail(dot)com> wrote:
> Hi,
>
> I'm having a character set problem, and I wonder if anyone here could
> sanity check what I'm doing. It might well be that the problem lies
> elsewhere.
>
> My database was created with -E UNICODE, and when I do a \l in psql it
> is listed as UTF8.
>
> My Java application is receiving data over a socket which is encoded
> in UTF8. I'm logging this and it is displaying e.g. Cyrillic or Greek
> correctly (using OSX Terminal.app which supports UTF8, tailing the log
> with 'less' and the environment variable LESSCHARSET=utf-8.
>
> I'm persisting this data using the latest 8.3 JDBC drivers into
> PostgreSQL 8.3.0. I'm not changing the client_encoding (I tried, but I
> understand that the JDBC drivers set it to UNICODE anyway, and throw
> an error if I attempt to change it to anything else). The data writes
> fine, and if I then do a SELECT and a resultSet.getString(x) and write
> the output to the log, everything still looks fine. I'm therefore
> satisfied that Java + JDBC drivers + PostgreSQL are able to write &
> read the data fine. So far so good.
>
> However, if using psql I try to look at the data, it is mangled. If I
> try a manual UPDATE via psql using the data cut'n'pasted from my log,
> and then look at the data, it reads correctly. Therefore I know that
> psql is capable of reading and writing UTF8 data correctly. Also, the
> client application that reads from my database is Perl, and this also
> retrieves mangled data; we've tried writing and reading directly from
> Perl, and in this case reviewing the data in psql looks normal.
>
> The conclusion I've reached is that Java + JDBC is not actually
> persisting the data in UTF-8; is that correct or am I wildly off base,
> and if it is correct then is there anything I can do about it?!
>
> Many thanks,
>
> Ant.
>

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Craig Ringer 2008-04-07 03:38:58 Re: Character Encoding problem
Previous Message antony baxter 2008-04-07 02:59:47 Character Encoding problem