Re: Japanese words not distinguished

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Harry Mantheakis <harry(at)mantheakis(dot)freeserve(dot)co(dot)uk>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Japanese words not distinguished
Date: 2005-07-12 16:51:42
Message-ID: 20221.1121187102@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Harry Mantheakis <harry(at)mantheakis(dot)freeserve(dot)co(dot)uk> writes:
> I run PostgreSQL 7.4.6 on Linux with a JDBC client.

> I initialised my database cluster with the following initdb command:

> initdb --locale=en_GB.UTF-8 --encoding UNICODE

> I have now discovered that my database cannot distinguish Japanese names or
> words - it throws unique constraint errors on a composite primary key that
> includes a VARCHAR field which stores the names or words.

> My tests indicate that the database treats all Japanese names/words as
> equal.

Hmm, is that actually the correct spelling of the locale? On my Linux
box, locale -a says it's "en_GB.utf8". I'm not sure how well initdb can
verify the validity of a locale parameter, especially back in the 7.4
branch. It could be that you are actually using a locale that doesn't
use UTF8 encoding, in which case this behavior is not unheard of
(still pretty broken, IMHO, but I've seen plenty of locale definitions
that just fail on data outside their supported character set).

If you did correctly specify a UTF8-using locale, you probably ought to
report this behavior to your Linux supplier as a bug in that locale
definition. It doesn't have to sort or case-fold random UTF8 data very
nicely, but it certainly shouldn't report distinct strings as equal.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Matt Miller 2005-07-12 16:53:38 PL/SQL to PLpg/SQL - NO_DATA_FOUND
Previous Message Zlatko Matic 2005-07-12 16:51:02 Re: Pb with boolean between MS-Access and PostgreSQl 8.0.3