Re: Character Decoding Problems

From: Barry Lind <blind(at)xythos(dot)com>
To: Evan Tsue <evan(at)windsormgmt(dot)com>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Character Decoding Problems
Date: 2003-08-13 18:25:46
Message-ID: 3F3A82AA.7070906@xythos.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Evan,

A call to getBytes() without specifying a character set will use the
default encoding for the jvm. I think it is platform dependent on how
the jvm determines its default encoding. In my environments the default
jvm encoding is LATIN1.

thanks,
--Barry

Evan Tsue wrote:
> Ok, I think I've figured out the problem. I retract my statement that
> the decodeUTF8
> method is incorrectly implemented.
>
> I'm still not exactly sure what the problem is. When I do a
> getBytes("UTF16")
> on the string I get back from the JDBC query, everything looks ok.
> However,
> when I do getBytes() it seems to default to some other encoding. Does
> anyone
> know what the deal is with this?
>
> The issue that still remains is why does the new String(...) method work
> for
> me whereas the decodeUTF8 method does not?
>
> Btw, thanks for everybody's help so far.
>
> Evan
>
> On Tuesday, Aug 12, 2003, at 23:50 US/Eastern, Evan Tsue wrote:
>
>> Ok, I've sat down with the problem a little bit more. It now seems
>> to me that
>> the decodeUTF8 method is doing the encoding correctly. It places the
>> result from translating from UTF-8 to UTF-16 in the char[] l_cdata
>> variable.
>> It then creates a new String by calling
>>
>> new String(l_cdata, 0, j)
>>
>> I believe that the variable j is the length of the filled in portion
>> of the l_cdata
>> array. l_cdata is a class variable that is reused between method calls
>> (the decodeUTF8 method is synchronized).
>>
>> This seems to be the problem. I haven't figured out why yet. I also
>> have the
>> same problem when running on FreeBSD (using the FreeBSD 1.4 JVM).
>>
>> Evan
>>
>>
>> On Tuesday, Aug 12, 2003, at 21:28 US/Eastern, zy7111 wrote:
>>
>>> I use pg73jdbc3.jar as JDBC driver. It works fine.
>>>
>>>> Yes, it should work in 7.2.2. The decodeUTF8 method wasn't introduced
>>>> until later. From the comments in the code, it seems that the reason
>>>> for its inclusion was for performance.
>>>>
>>>> Evan
>>>>
>>>> On Tuesday, Aug 12, 2003, at 08:34 US/Eastern, <zy7111(at)mail(dot)china(dot)com>
>>>> wrote:
>>>>
>>>>> I can insert and retrieve chinese into postgresql 7.2.2 successfully.
>>>>> Both operation through JDBC.
>>>>> It seems you insert text using psql and retrieve using JDBC.
>>>>>
>>>>> ----- Original Message -----
>>>>> From: "Evan Tsue" <evan(at)windsormgmt(dot)com>
>>>>> To: <pgsql-jdbc(at)postgresql(dot)org>
>>>>> Sent: Tuesday, August 12, 2003 1:38 PM
>>>>> Subject: [JDBC] Character Decoding Problems
>>>>>
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I've been having problems decoding non-Latin characters using the
>>>>>> Postgres JDBC driver. Here's the situation: I'm using postgres
>>>>>> 7.3.2
>>>>>> and I've created a test database using 'createdb -E UNICODE
>>>>>> testdb' to
>>>>>> ensure that I really am using the UNICODE character set. Using psql,
>>>>>> I
>>>>>> created a table using the following command: 'CREATE TABLE messages
>>>>>> (message_uid SERIAL PRIMARY KEY, message_text VARCHAR(255))' to test
>>>>>> character encoding and decoding. At that point, I inserted a message
>>>>>> that was in English. I also inserted a message that was in
>>>>>> Arabic. I
>>>>>> did a select on that table using psql and the values came back
>>>>>> perfectly (I'm using MacOS X, so the characters are displayed
>>>>>> correctly).
>>>>>> Next, I did a select on the same table via JDBC. All I had the
>>>>>> program do was select on the table and print the results out to
>>>>>> standard output. The message in English was displayed perfectly.
>>>>>> However, the message that was in Arabic was displayed as a series of
>>>>>> question marks and spaces.
>>>>>> I eventually navigated my way through the JDBC driver source to find
>>>>>> that the problem is in the decodeUTF8 method in the
>>>>>> org.postgresql.core.Encoding class. Apparently, it doesn't seem
>>>>>> to be
>>>>>> working properly for non-Western characters. I replaced the call to
>>>>>> that method with a call to the java.lang.String constructor and now
>>>>>> everything works perfectly.
>>>>>> In addition to Arabic, I took a random sample of Chinese, Japanese,
>>>>>> Russian and Korean text and inserted it into the database. Using the
>>>>>> original driver, I get the question marks. But, when I used the
>>>>>> String
>>>>>> constructor, everything comes out fine.
>>>>>> Could someone please either fix the Encoding.decodeUTF8 method or
>>>>>> replace the call to that with a call to the String constructor?
>>>>>>
>>>>>> Thanks,
>>>>>> Evan
>>>>>>
>>>>>>
>>>>>> ---------------------------(end of
>>>>>> broadcast)---------------------------
>>>>>> TIP 8: explain analyze is your friend
>>>>>>
>>>>>
>>>>> ---------------------------(end of
>>>>> broadcast)---------------------------
>>>>> TIP 8: explain analyze is your friend
>>>>
>>>>
>>>>
>>>> ---------------------------(end of
>>>> broadcast)---------------------------
>>>> TIP 2: you can get off all lists at once with the unregister command
>>>> (send "unregister YourEmailAddressHere" to
>>>> majordomo(at)postgresql(dot)org)
>>>
>>> ----------------------------------------------------------------------
>>> ÎÒ´æÔÚ£¬ÒòΪÎÒÊÇÖйúÈË,¾´Çë¹Ø×¢ÖлªÍøÐÅÌìÓÊ!
>>> ÐÅÌìÓÊÖ®ÊÕ·ÑÓÊ http://paymail.china.com
>>> ÐÅÌìÓÊÖ®Ãâ·ÑÓÊ http://mail.china.com
>>>
>>>
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>>>
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 2: you can get off all lists at once with the unregister command
>> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>
>

In response to

Browse pgsql-jdbc by date

  From Date Subject
Next Message Lufkin, Brad 2003-08-13 19:46:59 Query Time
Previous Message Evan Tsue 2003-08-13 17:11:42 Re: Character Decoding Problems