UNICODE encoding and jdbc related issues

From: Chris Kratz <chris(dot)kratz(at)vistashare(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: UNICODE encoding and jdbc related issues
Date: 2005-04-04 15:28:00
Message-ID: 200504041128.00634.chris.kratz@vistashare.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Our production database was created with the default SQL_ASCII encoding. It
appears that some of our users have entered characters into the system with
characters above 127 (accented vowels, etc). None of the tools we use
currently have had a problem with this behavior until recently, everything
just worked.

I was testing some reporting tools this past weekend and have been playing
with Jasper reports[1] . Jasper reports is a Java based reporting tool that
reads data from the database via JDBC. When I initially tried to generate
reports, the jdbc connection would crash with the following message:

org.postgresql.util.PSQLException: Invalid character data was found.

Googling eventually turned up a message on the pgsql-jdbc list detailing the
problem[2]. Basically, java cannot convert these characters above 127 into
unicode which is required by java.

After some more googling, I found that if I took a recent database dump and
then ran it through iconv[3] and then created the database with a unicode
encoding, everything worked.

1. Is there any way to do a iconv type translation inline in a sql statement?
ie select translate(text_field, unicode) from table.... Btw, set
client_encoding=UNICODE does not work in this situation. In fact the JDBC
driver for postgres seems to do this automatically.

2. I'm really not sure I want to change the encoding of our main database to
Unicode. Is there a performance loss when going to a UNICODE database
encoding? What about sorts, etc.

3. Is there any other way around this issue? Or are we living dangerously by
trying to store non-ascii data in a database created as ascii encoded?

4. Has anyone else gone through a conversion like this? Are there any gotchas
we should look out for?

Thanks,

-Chris

We are using postgres 7.4.5 on Linux.

[1] http://jasperreports.sourceforge.net/
[2] http://archives.postgresql.org/pgsql-jdbc/2004-10/msg00280.php
[3] iconv -f iso8859-1 -t utf-8 < dbsnapshot.dumpall > dump-utf-8.dumpall
--
Chris Kratz

Browse pgsql-general by date

  From Date Subject
Next Message Andrew Dunstan 2005-04-04 15:36:10 Re: [HACKERS] plPHP in core?
Previous Message Michelle Konzack 2005-04-04 15:20:06 Re: How to query pgsql from a BASH script ?