Re: Charset encoding patch to JDBC driver

From: Javier Yáñez <javier(at)cibal(dot)es>
To: Oliver Jowett <oliver(at)opencloud(dot)com>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Charset encoding patch to JDBC driver
Date: 2005-03-17 09:50:33
Message-ID: 423952E9.4010308@cibal.es
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Oliver Jowett wrote:

> I'm uncomfortable with applying this sort of patch to the official
> driver, since it makes the driver more complex just to handle what is
> arguably a database misconfiguration. It also introduces a new class of
> error: a mismatch between the driver's configured charSet and the actual
> database.

I think that this patch is necessary to resolve some problems of the
real life. In my particular case I have to make a j2ee application to
access a existing database. This database is SQL-ASCII encoding, with
the actual version of pgjdbc when the result of a query contains a 8
bits character (very common in Spanish) appears this error:

org.postgresql.util.PSQLException: Invalid character data was found.
This is most likely caused by stored data containing characters that are
invalid for the character set the database was created in. The most
common example of this is storing 8bit data in a SQL_ASCII database.

Many people has similar problems:

http://www.google.es/search?q=%22Invalid+character+data+was+found%22&hl=es&lr=&start=10&sa=N

http://linux.kieser.net/java_pg_unicode.html

I can not say to my customer that changes the database encoding
because other applications (non-java) could not work or show strange
characters.

By other hand, I do not think that to use SQL-ASCII encoding is a
database misconfiguration. I do not think that storing 8bit data in a
SQL_ASCII database is incorrect. Others applications are using the same
database with ODBC without problem.

> Comments on the patch itself:
>
> - it is missing changes to the v2 protocol path

I have not proven it, but I think that the v2 protocol has the
functionality of choose the encoding.

> - why does it remove the client_encoding sanity check on connect?

my intention was to remove the verification of client_encoding is equals
to UNICODE. I agree with to check the client_encoding.

> - since encoding does not change for the lifetime of the connection,
> can't you make the encoding a field of QueryExecutoryImpl rather than
> passing it around everywhere?

I agree.

> - it may be better to pass encoding as a parameter to
> SimpleParameterList methods that need it, rather than storing the (same)
> value on every list instance.

I agree too. The encoding object only is used in 2 methods.

I'm going to try to improve the patch and post it.

Thank you for your time!

Javier Yáñez

--
CIBAL Multimedia S.L.
Edificio 17, C-10
ParcBIT
Camino de Can Manuel s/n
07120 - Palma de Mallorca
Spain

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message jonathan.lister 2005-03-17 10:29:14 Re: Cannot Retrieve Binary Data
Previous Message Oliver Jowett 2005-03-16 21:16:19 Re: invalid string enlargement request