Re: ArrayIndexOutOfBoundsException in Encoding.decodeUTF8()

From: Joseph Shraibman <jks(at)selectacast(dot)net>
To: Barry Lind <blind(at)xythos(dot)com>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: ArrayIndexOutOfBoundsException in Encoding.decodeUTF8()
Date: 2003-01-08 19:51:38
Message-ID: 3E1C814A.1020307@selectacast.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-jdbc

Well this data was inserted into postgres through the jdbc driver in the first place.

So how come postgres itself didn't complain about non-ascii data? How do I change the
encoding? And what will the side effects be?

Barry Lind wrote:
> Joseph,
>
> The problem is that your database claims to be ASCII, but you are
> storing non-ascii data in it.
>
> As of 7.3 the jdbc driver has the server convert from the database
> character set to UTF8. Then send the data to the driver in UTF8 and the
> driver then decodes the UTF8 to java unicode.
>
> The conversion from ASCII to UTF8 is a noop since the 127 characters of
> ascii map directly to the same values in UTF8. However since you are
> storing not ASCII data the values that have the values from 128 - 255
> just get passed from the server to the client without any additional
> processing (since there aren't supposed to be any values in this range),
> but then when the driver tries to convert to java unicode, it can't
> because it has received an invalid UTF8 character.
>
> It seems that you are actually storing Latin1 data in this database and
> thus the database character set should probably be Latin1.
>
> In 7.2 is was possible to override the character set used by the driver,
> however I don't think this works anymore when connecting to a 7.3
> server. .... looks at code .... Yes the override is ignored if the
> server is a 7.3 server. You could hack at AbstractJdbc1Connection to
> work around the issue or just correctly set the database character set
> to match the data that the database contains.
>
> thanks,
> --Barry
>
>
> Joseph Shraibman wrote:
>
>> BTW the string that caused this is 'Oné'
>>
>> Joseph Shraibman wrote:
>>
>>> java.lang.ArrayIndexOutOfBoundsException: 3
>>> at org.postgresql.core.Encoding.decodeUTF8(Encoding.java:253)
>>> at org.postgresql.core.Encoding.decode(Encoding.java:165)
>>> at org.postgresql.core.Encoding.decode(Encoding.java:181)
>>> at
>>> org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1ResultSet.java:97)
>>>
>>>
>>> The relavent code is:
>>>
>>> while (i < k) {
>>> z = data[i] & 0xFF;
>>> if (z < 0x80) {
>>> l_cdata[j++] = (char)data[i];
>>> i++;
>>> } else if (z >= 0xE0) { // length == 3
>>> y = data[i+1] & 0xFF; //<<== THIS IS LINE 253
>>> x = data[i+2] & 0xFF;
>>> val = (z-0xE0)*pow2_12 + (y-0x80)*pow2_6 + (x-0x80);
>>> l_cdata[j++] = (char) val;
>>> i+= 3;
>>> } else { // length == 2 (maybe add checking for
>>> length > 3, throw exception if it is
>>>
>>>
>>> And in the method that calls that:
>>>
>>> if (encoding.equals("UTF-8")) {
>>> return decodeUTF8(encodedString, offset, length);
>>> }
>>>
>>> The thing is my database encoding is SQL_ASCII
>>>
>>> => SELECT version(), getdatabaseencoding() ;
>>>
>>> version | getdatabaseencoding
>>> ---------------------------------------------------------------------------------------------------------+---------------------
>>>
>>> PostgreSQL 7.3.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2
>>> 20020903 (Red Hat Linux 8.0 3.2-7) | SQL_ASCII
>>> (1 row)
>>>
>>> ... so why is it trying to decode the string as UTF-8? I just
>>> upgraded this database from 7.2.3 yesterday.
>>>
>>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Joseph Shraibman 2003-01-08 19:55:07 Re: shared buffers in config
Previous Message Johnson, Shaunn 2003-01-08 19:14:29 Re: too many Fatal Error 1: shutdown messages

Browse pgsql-jdbc by date

  From Date Subject
Next Message Michael Paesold 2003-01-08 20:16:23 Re: synchronized code
Previous Message Felipe Schnack 2003-01-08 19:34:04 Re: server-side prepared