Skip site navigation (1) Skip section navigation (2)

Re: Character Decoding Problems

From: Evan Tsue <evan(at)windsormgmt(dot)com>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Character Decoding Problems
Date: 2003-08-13 03:50:16
Message-ID: 40249E56-CD41-11D7-A787-000A95A08104@windsormgmt.com (view raw or flat)
Thread:
Lists: pgsql-jdbc
Ok,  I've sat down with the problem a little bit more.  It now seems to 
me that
the decodeUTF8 method is doing the encoding correctly.  It places the
result from translating from UTF-8 to UTF-16 in the char[] l_cdata 
variable.
It then creates a new String by calling

	new String(l_cdata, 0, j)

I believe that the variable j is the length of the filled in portion of 
the l_cdata
array.  l_cdata is a class variable that is reused between method calls
(the decodeUTF8 method is synchronized).

This seems to be the problem.  I haven't figured out why yet.  I also 
have the
same problem when running on FreeBSD (using the FreeBSD 1.4 JVM).

Evan


On Tuesday, Aug 12, 2003, at 21:28 US/Eastern, zy7111 wrote:

> I use pg73jdbc3.jar as JDBC driver. It works fine.
>
>> Yes, it should work in 7.2.2.  The decodeUTF8 method wasn't introduced
>> until later.  From the comments in the code, it seems that the reason
>> for its inclusion was for performance.
>>
>> Evan
>>
>> On Tuesday, Aug 12, 2003, at 08:34 US/Eastern, <zy7111(at)mail(dot)china(dot)com>
>> wrote:
>>
>>> I can insert and retrieve chinese into postgresql 7.2.2 successfully.
>>> Both operation through JDBC.
>>> It seems you insert text using psql and retrieve using JDBC.
>>>
>>> ----- Original Message -----
>>> From: "Evan Tsue" <evan(at)windsormgmt(dot)com>
>>> To: <pgsql-jdbc(at)postgresql(dot)org>
>>> Sent: Tuesday, August 12, 2003 1:38 PM
>>> Subject: [JDBC] Character Decoding Problems
>>>
>>>
>>>> Hi,
>>>>
>>>> I've been having problems decoding non-Latin characters using the
>>>> Postgres JDBC driver.  Here's the situation:  I'm using postgres 
>>>> 7.3.2
>>>> and I've created a test database using 'createdb -E UNICODE testdb' 
>>>> to
>>>> ensure that I really am using the UNICODE character set.  Using 
>>>> psql,
>>>> I
>>>> created a table using the following command: 'CREATE TABLE messages
>>>> (message_uid SERIAL PRIMARY KEY, message_text VARCHAR(255))' to test
>>>> character encoding and decoding.  At that point, I inserted a 
>>>> message
>>>> that was in English.  I also inserted a message that was in Arabic. 
>>>>  I
>>>> did a select on that table using psql and the values came back
>>>> perfectly (I'm using MacOS X, so the characters are displayed
>>>> correctly).
>>>> Next, I did a select on the same table via JDBC.  All I had the
>>>> program do was select on the table and print the results out to
>>>> standard output.  The message in English was displayed perfectly.
>>>> However, the message that was in Arabic was displayed as a series of
>>>> question marks and spaces.
>>>> I eventually navigated my way through the JDBC driver source to find
>>>> that the problem is in the decodeUTF8 method in the
>>>> org.postgresql.core.Encoding class.  Apparently, it doesn't seem to 
>>>> be
>>>> working properly for non-Western characters.  I replaced the call to
>>>> that method with a call to the java.lang.String constructor and now
>>>> everything works perfectly.
>>>> In addition to Arabic, I took a random sample of Chinese, Japanese,
>>>> Russian and Korean text and inserted it into the database.  Using 
>>>> the
>>>> original driver, I get the question marks.  But, when I used the
>>>> String
>>>> constructor, everything comes out fine.
>>>> Could someone please either fix the Encoding.decodeUTF8 method or
>>>> replace the call to that with a call to the String constructor?
>>>>
>>>> Thanks,
>>>> Evan
>>>>
>>>>
>>>> ---------------------------(end of
>>>> broadcast)---------------------------
>>>> TIP 8: explain analyze is your friend
>>>>
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 8: explain analyze is your friend
>>
>>
>> ---------------------------(end of 
>> broadcast)---------------------------
>> TIP 2: you can get off all lists at once with the unregister command
>>     (send "unregister YourEmailAddressHere" to 
>> majordomo(at)postgresql(dot)org)
> ----------------------------------------------------------------------
> ÎÒ´æÔÚ£¬ÒòΪÎÒÊÇÖйúÈË,¾´Çë¹Ø×¢ÖлªÍøÐÅÌìÓÊ!
> ÐÅÌìÓÊÖ®ÊÕ·ÑÓÊ http://paymail.china.com
> ÐÅÌìÓÊÖ®Ãâ·ÑÓÊ http://mail.china.com
>
>
>
> ---------------------------(end of 
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to 
> majordomo(at)postgresql(dot)org
>


In response to

Responses

pgsql-jdbc by date

Next:From: Arturo PérezDate: 2003-08-13 13:55:07
Subject: Fwd: [HACKERS] 7.4 LOG: invalid message length
Previous:From: zy7111Date: 2003-08-13 01:28:36
Subject: Re: Character Decoding Problems

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group