Re: Character Decoding Problems

From: Evan Tsue <evan(at)windsormgmt(dot)com>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Character Decoding Problems
Date: 2003-08-13 17:11:42
Message-ID: 3595B54E-CDB1-11D7-A787-000A95A08104@windsormgmt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Ok, I think I've figured out the problem. I retract my statement that
the decodeUTF8
method is incorrectly implemented.

I'm still not exactly sure what the problem is. When I do a
getBytes("UTF16")
on the string I get back from the JDBC query, everything looks ok.
However,
when I do getBytes() it seems to default to some other encoding. Does
anyone
know what the deal is with this?

The issue that still remains is why does the new String(...) method
work for
me whereas the decodeUTF8 method does not?

Btw, thanks for everybody's help so far.

Evan

On Tuesday, Aug 12, 2003, at 23:50 US/Eastern, Evan Tsue wrote:

> Ok, I've sat down with the problem a little bit more. It now seems
> to me that
> the decodeUTF8 method is doing the encoding correctly. It places the
> result from translating from UTF-8 to UTF-16 in the char[] l_cdata
> variable.
> It then creates a new String by calling
>
> new String(l_cdata, 0, j)
>
> I believe that the variable j is the length of the filled in portion
> of the l_cdata
> array. l_cdata is a class variable that is reused between method calls
> (the decodeUTF8 method is synchronized).
>
> This seems to be the problem. I haven't figured out why yet. I also
> have the
> same problem when running on FreeBSD (using the FreeBSD 1.4 JVM).
>
> Evan
>
>
> On Tuesday, Aug 12, 2003, at 21:28 US/Eastern, zy7111 wrote:
>
>> I use pg73jdbc3.jar as JDBC driver. It works fine.
>>
>>> Yes, it should work in 7.2.2. The decodeUTF8 method wasn't
>>> introduced
>>> until later. From the comments in the code, it seems that the reason
>>> for its inclusion was for performance.
>>>
>>> Evan
>>>
>>> On Tuesday, Aug 12, 2003, at 08:34 US/Eastern,
>>> <zy7111(at)mail(dot)china(dot)com>
>>> wrote:
>>>
>>>> I can insert and retrieve chinese into postgresql 7.2.2
>>>> successfully.
>>>> Both operation through JDBC.
>>>> It seems you insert text using psql and retrieve using JDBC.
>>>>
>>>> ----- Original Message -----
>>>> From: "Evan Tsue" <evan(at)windsormgmt(dot)com>
>>>> To: <pgsql-jdbc(at)postgresql(dot)org>
>>>> Sent: Tuesday, August 12, 2003 1:38 PM
>>>> Subject: [JDBC] Character Decoding Problems
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> I've been having problems decoding non-Latin characters using the
>>>>> Postgres JDBC driver. Here's the situation: I'm using postgres
>>>>> 7.3.2
>>>>> and I've created a test database using 'createdb -E UNICODE
>>>>> testdb' to
>>>>> ensure that I really am using the UNICODE character set. Using
>>>>> psql,
>>>>> I
>>>>> created a table using the following command: 'CREATE TABLE messages
>>>>> (message_uid SERIAL PRIMARY KEY, message_text VARCHAR(255))' to
>>>>> test
>>>>> character encoding and decoding. At that point, I inserted a
>>>>> message
>>>>> that was in English. I also inserted a message that was in
>>>>> Arabic. I
>>>>> did a select on that table using psql and the values came back
>>>>> perfectly (I'm using MacOS X, so the characters are displayed
>>>>> correctly).
>>>>> Next, I did a select on the same table via JDBC. All I had the
>>>>> program do was select on the table and print the results out to
>>>>> standard output. The message in English was displayed perfectly.
>>>>> However, the message that was in Arabic was displayed as a series
>>>>> of
>>>>> question marks and spaces.
>>>>> I eventually navigated my way through the JDBC driver source to
>>>>> find
>>>>> that the problem is in the decodeUTF8 method in the
>>>>> org.postgresql.core.Encoding class. Apparently, it doesn't seem
>>>>> to be
>>>>> working properly for non-Western characters. I replaced the call
>>>>> to
>>>>> that method with a call to the java.lang.String constructor and now
>>>>> everything works perfectly.
>>>>> In addition to Arabic, I took a random sample of Chinese, Japanese,
>>>>> Russian and Korean text and inserted it into the database. Using
>>>>> the
>>>>> original driver, I get the question marks. But, when I used the
>>>>> String
>>>>> constructor, everything comes out fine.
>>>>> Could someone please either fix the Encoding.decodeUTF8 method or
>>>>> replace the call to that with a call to the String constructor?
>>>>>
>>>>> Thanks,
>>>>> Evan
>>>>>
>>>>>
>>>>> ---------------------------(end of
>>>>> broadcast)---------------------------
>>>>> TIP 8: explain analyze is your friend
>>>>>
>>>>
>>>> ---------------------------(end of
>>>> broadcast)---------------------------
>>>> TIP 8: explain analyze is your friend
>>>
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 2: you can get off all lists at once with the unregister command
>>> (send "unregister YourEmailAddressHere" to
>>> majordomo(at)postgresql(dot)org)
>> ----------------------------------------------------------------------
>> ÎÒ´æÔÚ£¬ÒòΪÎÒÊÇÖйúÈË,¾´Çë¹Ø×¢ÖлªÍøÐÅÌìÓÊ!
>> ÐÅÌìÓÊÖ®ÊÕ·ÑÓÊ http://paymail.china.com
>> ÐÅÌìÓÊÖ®Ãâ·ÑÓÊ http://mail.china.com
>>
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 1: subscribe and unsubscribe commands go to
>> majordomo(at)postgresql(dot)org
>>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Barry Lind 2003-08-13 18:25:46 Re: Character Decoding Problems
Previous Message Arturo Pérez 2003-08-13 13:55:07 Fwd: [HACKERS] 7.4 LOG: invalid message length