Re: unexpected query behavior with UTF text

From: Indra Heckenbach <indra(at)macnica(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: unexpected query behavior with UTF text
Date: 2003-10-23 02:57:49
Message-ID: 3F9743AD.5090209@macnica.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
<title></title>
</head>
<body text="#000000" bgcolor="#ffffff">
Hi Tom,<br>
<br>
I solved the problem by doing<br>
<br>
initdb --locale=ja_JP.utf8<br>
<br>
Unfortunately,<br>
<br>
initdb --locale=en_US.utf8<br>
<br>
does not work. Do you have any idea why?&nbsp; I would think we should be
able to test for equality in any locale.<br>
<br>
thanks,<br>
Indra<br>
<br>
<br>
<br>
Tom Lane wrote:<br>
<blockquote type="cite" cite="mid11056(dot)1066831136(at)sss(dot)pgh(dot)pa(dot)us">
<pre wrap="">Indra Heckenbach <a class="moz-txt-link-rfc2396E" href="mailto:indra(at)macnica(dot)com">&lt;indra(at)macnica(dot)com&gt;</a> writes:
</pre>
<blockquote type="cite">
<pre wrap="">I have recently come across an unusual behavior with Postgres 7.3.4 on a
Linux RH 9 system. My database has encoding set to "UNICODE", and the
table includes Japanese text. I'm trying to issue a query like this:
</pre>
</blockquote>
<pre wrap=""><!---->
</pre>
<blockquote type="cite">
<pre wrap="">SELECT * FROM sales WHERE name='ja-text';
</pre>
</blockquote>
<pre wrap=""><!---->
</pre>
<blockquote type="cite">
<pre wrap="">This query ignores all japanese characters in the comparison text. It
matches properly on ascii chars, but skips right over ja chars.
</pre>
</blockquote>
<pre wrap=""><!---->
Text = depends on strcoll(), which is locale-sensitive. It sure appears
that your locale is designed to ignore japanese characters :-(

</pre>
<blockquote type="cite">
<pre wrap="">I found a related issue on the mailing list, where locale setting was
causing something similar. However, my locale is set to "en_US.UTF-8",
which is the solution proposed to the other problem.
</pre>
</blockquote>
<pre wrap=""><!---->
We have heard before that RH9's default locale setting is seriously
broken. This seems to be additional evidence for that opinion. I'd
recommend re-initdb'ing in locale C.

Also, you say "your locale", but how certain are you that that is the
database's locale, and not just the one in your own user environment?
It'd be a good idea to use pg_controldata to check the database settings.

regards, tom lane

</pre>
</blockquote>
<br>
</body>
</html>

Attachment Content-Type Size
unknown_filename text/html 2.3 KB

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Oliver Elphick 2003-10-23 05:24:17 Re: PostgreSQL v7.4 Beta5 Available for Testing
Previous Message Marc G. Fournier 2003-10-23 00:49:33 PostgreSQL v7.4 Beta5 Available for Testing