From: | Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM> |
---|---|
To: | Martijn van Oosterhout <kleptog(at)svana(dot)org> |
Cc: | Radek Strnad <radek(dot)strnad(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [WIP] collation support revisited (phase 1) |
Date: | 2008-07-22 17:03:26 |
Message-ID: | 488612DE.5060206@sun.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Martijn van Oosterhout napsal(a):
> On Mon, Jul 21, 2008 at 03:15:56AM +0200, Radek Strnad wrote:
>> I was trying to sort out the problem with not creating new catalog for
>> character sets and I came up following ideas. Correct me if my ideas are
>> wrong.
>>
>> Since collation has to have a defined character set.
>
> Not really. AIUI at least glibc and ICU define a collation over all
> possible characters (ie unicode). When you create a locale you take a
> subset and use that. Think about it: if you want to sort strings and
> one of them happens to contain a chinese charater, it can't *fail*.
> Note strcoll() has no error return for unknown characters.
It has.
See http://www.opengroup.org/onlinepubs/009695399/functions/strcoll.html
The strcoll() function may fail if:
[EINVAL]
[CX] The s1 or s2 arguments contain characters outside the domain of
the collating sequence.
>> I'm suggesting to use
>> already written infrastructure of encodings and to use list of encodings in
>> chklocale.c. Currently databases are not created with specified character
>> set but with specified encoding. I think instead of pointing a record in
>> collation catalog to another record in character set catalog we might use
>> only name (string) of the encoding.
>
> That's reasonable. From an abstract point of view collations and
> encodings are orthoginal, it's only when you're using POSIX locales
> that there are limitations on how you combine them. I think you can
> assume a collation can handle any characters that can be produced by
> encoding.
I think you are not correct. You cannot use collation over all UNICODE. See
http://www.unicode.org/reports/tr10/#Common_Misperceptions. Same characters can
be ordered differently in different languages.
Zdenek
--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2008-07-22 17:15:49 | Re: Postgres-R: primary key patches |
Previous Message | Markus Wanner | 2008-07-22 16:59:38 | Re: Postgres-R: primary key patches |