Re: Selecting on non ASCII varchars

From: Vadim Nasardinov <vadimn(at)redhat(dot)com>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Selecting on non ASCII varchars
Date: 2005-10-04 21:14:01
Message-ID: 200510041714.01768.vadimn@redhat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

On Tuesday 04 October 2005 16:16, Jeremy LaCivita wrote:
> Hmmm
>
> so it turns out if i take all my Strings and do this:
>
> str = new String(str.getBytes(), "utf-8");
>
> then it works.
>
> Correct me if i'm wrong, but that says to me that the Strings were
> in UTF-8 already, but Java didn't know it, so it couldn't send them
> to postgres properly.

It's meaningless to ask what encoding a String has. String are
sequence of chars -- they don't have an encoding. The notion of
"encoding" comes into play only when you have to represent a String as
a sequence of bytes.

So, if this returns true for you:

str.equals(new String(str.getBytes(), "utf-8"));

that means your default encoding is either utf-8 or a subset of utf-8,
at least for the characters found in str.

String#getBytes() uses the default encoding which may be specified via
the environment variable LANG on on Unix-like systems.

So, if my default encoding is UTF-8, I get this:

| $ echo $LANG
| en_US.UTF-8
| $ bsh2
| BeanShell 2.0-0.b1.7jpp - by Pat Niemeyer (pat(at)pat(dot)net)
| bsh % print(System.getProperty("file.encoding"));
| UTF-8
| bsh % str = "Funny char: \u00e8";
| bsh % print(str);
| Funny char: è
| bsh % print(str.equals(new String(str.getBytes(), "utf-8")));
| true
| bsh %

If I change the default encoding to ISO-8859-1, I get this:

| $ env LANG=en_US.iso88591 bsh2
| BeanShell 2.0-0.b1.7jpp - by Pat Niemeyer (pat(at)pat(dot)net)
| bsh % print(System.getProperty("file.encoding"));
| ISO-8859-1
| bsh % str = "Funny char: \u00e8";
| bsh % print(str);
| Funny char: è
| bsh % print(str.equals(new String(str.getBytes(), "utf-8")));
| false
| bsh %

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message exeden 2005-10-05 08:30:06 Connection to PostgreSQL server behind proxy
Previous Message Jeremy LaCivita 2005-10-04 21:08:45 Re: Selecting on non ASCII varchars