Re: Character Decoding Problems

From: Evan Tsue <evan(at)windsormgmt(dot)com>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Character Decoding Problems
Date: 2003-08-13 03:50:16
Message-ID: 40249E56-CD41-11D7-A787-000A95A08104@windsormgmt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Ok, I've sat down with the problem a little bit more. It now seems to
me that
the decodeUTF8 method is doing the encoding correctly. It places the
result from translating from UTF-8 to UTF-16 in the char[] l_cdata
variable.
It then creates a new String by calling

new String(l_cdata, 0, j)

I believe that the variable j is the length of the filled in portion of
the l_cdata
array. l_cdata is a class variable that is reused between method calls
(the decodeUTF8 method is synchronized).

This seems to be the problem. I haven't figured out why yet. I also
have the
same problem when running on FreeBSD (using the FreeBSD 1.4 JVM).

Evan

On Tuesday, Aug 12, 2003, at 21:28 US/Eastern, zy7111 wrote:

> I use pg73jdbc3.jar as JDBC driver. It works fine.
>
>> Yes, it should work in 7.2.2. The decodeUTF8 method wasn't introduced
>> until later. From the comments in the code, it seems that the reason
>> for its inclusion was for performance.
>>
>> Evan
>>
>> On Tuesday, Aug 12, 2003, at 08:34 US/Eastern, <zy7111(at)mail(dot)china(dot)com>
>> wrote:
>>
>>> I can insert and retrieve chinese into postgresql 7.2.2 successfully.
>>> Both operation through JDBC.
>>> It seems you insert text using psql and retrieve using JDBC.
>>>
>>> ----- Original Message -----
>>> From: "Evan Tsue" <evan(at)windsormgmt(dot)com>
>>> To: <pgsql-jdbc(at)postgresql(dot)org>
>>> Sent: Tuesday, August 12, 2003 1:38 PM
>>> Subject: [JDBC] Character Decoding Problems
>>>
>>>
>>>> Hi,
>>>>
>>>> I've been having problems decoding non-Latin characters using the
>>>> Postgres JDBC driver. Here's the situation: I'm using postgres
>>>> 7.3.2
>>>> and I've created a test database using 'createdb -E UNICODE testdb'
>>>> to
>>>> ensure that I really am using the UNICODE character set. Using
>>>> psql,
>>>> I
>>>> created a table using the following command: 'CREATE TABLE messages
>>>> (message_uid SERIAL PRIMARY KEY, message_text VARCHAR(255))' to test
>>>> character encoding and decoding. At that point, I inserted a
>>>> message
>>>> that was in English. I also inserted a message that was in Arabic.
>>>> I
>>>> did a select on that table using psql and the values came back
>>>> perfectly (I'm using MacOS X, so the characters are displayed
>>>> correctly).
>>>> Next, I did a select on the same table via JDBC. All I had the
>>>> program do was select on the table and print the results out to
>>>> standard output. The message in English was displayed perfectly.
>>>> However, the message that was in Arabic was displayed as a series of
>>>> question marks and spaces.
>>>> I eventually navigated my way through the JDBC driver source to find
>>>> that the problem is in the decodeUTF8 method in the
>>>> org.postgresql.core.Encoding class. Apparently, it doesn't seem to
>>>> be
>>>> working properly for non-Western characters. I replaced the call to
>>>> that method with a call to the java.lang.String constructor and now
>>>> everything works perfectly.
>>>> In addition to Arabic, I took a random sample of Chinese, Japanese,
>>>> Russian and Korean text and inserted it into the database. Using
>>>> the
>>>> original driver, I get the question marks. But, when I used the
>>>> String
>>>> constructor, everything comes out fine.
>>>> Could someone please either fix the Encoding.decodeUTF8 method or
>>>> replace the call to that with a call to the String constructor?
>>>>
>>>> Thanks,
>>>> Evan
>>>>
>>>>
>>>> ---------------------------(end of
>>>> broadcast)---------------------------
>>>> TIP 8: explain analyze is your friend
>>>>
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 8: explain analyze is your friend
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 2: you can get off all lists at once with the unregister command
>> (send "unregister YourEmailAddressHere" to
>> majordomo(at)postgresql(dot)org)
> ----------------------------------------------------------------------
> ÎÒ´æÔÚ£¬ÒòΪÎÒÊÇÖйúÈË,¾´Çë¹Ø×¢ÖлªÍøÐÅÌìÓÊ!
> ÐÅÌìÓÊÖ®ÊÕ·ÑÓÊ http://paymail.china.com
> ÐÅÌìÓÊÖ®Ãâ·ÑÓÊ http://mail.china.com
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
> majordomo(at)postgresql(dot)org
>

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Arturo Pérez 2003-08-13 13:55:07 Fwd: [HACKERS] 7.4 LOG: invalid message length
Previous Message zy7111 2003-08-13 01:28:36 Re: Character Decoding Problems