Re: Selecting on non ASCII varchars

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <jlacivita(at)broadrelay(dot)com>, <oliver(at)opencloud(dot)com>
Cc: <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: Selecting on non ASCII varchars
Date: 2005-10-04 20:37:07
Message-ID: s342a1aa.000@gwmta.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

A String object doesn't contain an array of bytes; it contains an
array of characters. Somehow you created String objects from
bytes using the wrong character encoding technique (not to be
confused with a character set). Your str.getBytes() is using the
default encoding scheme to convert the characters to bytes. In
this case, it seems that all the characters are mapping back to
the original bytes, although I don't think that's always necessarily
going to happen. By specifying the "utf-8" in the String
constructor, you're telling it to use a specific encoding technique
to convert those bytes to characters.

There is nothing in a String object to "flag" it for any particular
encoding. The encoding only comes into play when turning
bytes into characters or vice versa.

-Kevin


>>> Jeremy LaCivita <jlacivita(at)broadrelay(dot)com> 10/04/05 3:16 PM >>>
Hmmm

so it turns out if i take all my Strings and do this:

str = new String(str.getBytes(), "utf-8");

then it works.

Correct me if i'm wrong, but that says to me that the Strings were in
UTF-8 already, but Java didn't know it, so it couldn't send them to
postgres properly.

because str.getBytes() will return the same bytes that were used to
create the string, and new String(bytes, "utf-8") will repackage them
into a string using utf-8, so nothing has really changed at the byte
level, java has just explicitly marked it as UTF-8.

Anyway, problem solved. As to why my strings aren't flagged as
UTF-8, thats not a postgres problem.

Thanks!

-jl

On Oct 2, 2005, at 9:41 PM, Oliver Jowett wrote:

> Jeremy LaCivita wrote:
>
>
>> PreparedStatement pst = conn.prepareStatement("SELECT * from
>> mytable m
>> where m.title ~* ?");
>>
>
> If you use direct equality (=), does it work?
>
> There have been comments on pgsql-bugs recently that some areas of the
> backend code (case insensitive comparison and regexp) do not work
> correctly in all cases when multibyte encodings are used. You might
> want
> to repost to -bugs if basic equality works correctly.
>
> Do you have a selfcontained testcase we can try? In particular we need
> to know the actual column values and regexp patterns you have
> problems with.
>
> -O
>

Browse pgsql-jdbc by date

  From Date Subject
Next Message Jeremy LaCivita 2005-10-04 21:08:45 Re: Selecting on non ASCII varchars
Previous Message Kevin Grittner 2005-10-04 20:25:05 Re: Getting status on login failure