Skip site navigation (1) Skip section navigation (2)

Re: BUG #1268: Two different Unicode chars are treated as

From: Kent Tong <kent(at)cpttm(dot)org(dot)mo>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #1268: Two different Unicode chars are treated as
Date: 2004-09-24 03:51:40
Message-ID: 415399CC.9030008@cpttm.org.mo (view raw, whole thread or download thread mbox)
Thread:
Lists: pgsql-bugs
Tom Lane wrote:

> "PostgreSQL Bugs List" <pgsql-bugs(at)postgresql(dot)org> writes:
> 
>>Description:        Two different Unicode chars are treated as equal in a 
>>query 
> 
> 
> This would be a matter to take up with the maintainer of your locale
> (which you didn't mention, but in any case it's a locale bug).  We
> just do what strcoll() tells us.

Thanks for the quick reply. The system locale is zh_TW.Big5. However,
I've tried setting it to "C" but the test case still fails.

In order to check if it's a locale bug, I've written a C program:

#include <locale.h>
#include <stdio.h>
#include <string.h>

int main() {
         char *s1 = "\xe4\xba\x8c";
         char *s2 = "\xe4\xba\x94";
         setlocale(LC_ALL, "en.UTF-8");
         //setlocale(LC_ALL, "zh.Big5"); //doesn't make any difference
         printf("%d\n", strcoll(s1, s2));
         return 0;
}

and compiled it and run it on that computer. It prints -1.
It means that strcoll is working.

 > Note that it's possible this is a configuration error and not an
 > outright bug.  Check to make sure that the locale you initdb'd
 > under is actually designed to work with UTF-8 data.

Does it matter? The encoding provided to initdb is just
a default for the databases to be created in the future.
When I used createdb, I did specify "-E unicode".

-- 
Kent Tong, Msc, MCSE, SCJP, CCSA, Delphi Certified
Manager of IT Dept, CPTTM
Authorized training for Borland, Cisco, Microsoft, Oracle, RedFlag & RedHat

In response to

Responses

pgsql-bugs by date

Next:From: Tom LaneDate: 2004-09-24 04:33:01
Subject: Re: BUG #1268: Two different Unicode chars are treated as
Previous:From: Tom LaneDate: 2004-09-24 03:06:05
Subject: Re: BUG #1268: Two different Unicode chars are treated as equal in a query

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group