Re: Selecting on non ASCII varchars

From: Jeremy LaCivita <jlacivita(at)broadrelay(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: <oliver(at)opencloud(dot)com>, <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: Selecting on non ASCII varchars
Date: 2005-10-04 21:08:45
Message-ID: 0061FE85-849C-46C8-8A53-B6BCE32C4C54@broadrelay.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Makes sense. So i guess when my original UTF-8 byte stream comes to
flash, its creating a new string with the default encoding out of
those bytes.

makes sense that it would fix it by getting the bytes again using the
default encoding and then recreating the string as UTF-8.

I agree its risky though, since its not all under my control.

I'll have to find out where the strings are getting created initially.

thanks!

-jl

On Oct 4, 2005, at 4:37 PM, Kevin Grittner wrote:

> A String object doesn't contain an array of bytes; it contains an
> array of characters. Somehow you created String objects from
> bytes using the wrong character encoding technique (not to be
> confused with a character set). Your str.getBytes() is using the
> default encoding scheme to convert the characters to bytes. In
> this case, it seems that all the characters are mapping back to
> the original bytes, although I don't think that's always necessarily
> going to happen. By specifying the "utf-8" in the String
> constructor, you're telling it to use a specific encoding technique
> to convert those bytes to characters.
>
> There is nothing in a String object to "flag" it for any particular
> encoding. The encoding only comes into play when turning
> bytes into characters or vice versa.
>
> -Kevin
>
>
>
>>>> Jeremy LaCivita <jlacivita(at)broadrelay(dot)com> 10/04/05 3:16 PM >>>
>>>>
> Hmmm
>
> so it turns out if i take all my Strings and do this:
>
> str = new String(str.getBytes(), "utf-8");
>
> then it works.
>
> Correct me if i'm wrong, but that says to me that the Strings were in
> UTF-8 already, but Java didn't know it, so it couldn't send them to
> postgres properly.
>
> because str.getBytes() will return the same bytes that were used to
> create the string, and new String(bytes, "utf-8") will repackage them
> into a string using utf-8, so nothing has really changed at the byte
> level, java has just explicitly marked it as UTF-8.
>
> Anyway, problem solved. As to why my strings aren't flagged as
> UTF-8, thats not a postgres problem.
>
> Thanks!
>
> -jl
>
> On Oct 2, 2005, at 9:41 PM, Oliver Jowett wrote:
>
>
>> Jeremy LaCivita wrote:
>>
>>
>>
>>> PreparedStatement pst = conn.prepareStatement("SELECT * from
>>> mytable m
>>> where m.title ~* ?");
>>>
>>>
>>
>> If you use direct equality (=), does it work?
>>
>> There have been comments on pgsql-bugs recently that some areas of
>> the
>> backend code (case insensitive comparison and regexp) do not work
>> correctly in all cases when multibyte encodings are used. You might
>> want
>> to repost to -bugs if basic equality works correctly.
>>
>> Do you have a selfcontained testcase we can try? In particular we
>> need
>> to know the actual column values and regexp patterns you have
>> problems with.
>>
>> -O
>>
>>
>
>
>

Browse pgsql-jdbc by date

  From Date Subject
Next Message Vadim Nasardinov 2005-10-04 21:14:01 Re: Selecting on non ASCII varchars
Previous Message Kevin Grittner 2005-10-04 20:37:07 Re: Selecting on non ASCII varchars