Re: [WIP] collation support revisited (phase 1)

From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Radek Strnad <radek(dot)strnad(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [WIP] collation support revisited (phase 1)
Date: 2008-07-22 14:32:39
Message-ID: 4885EF87.4020608@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Martijn van Oosterhout napsal(a):
> On Sat, Jul 12, 2008 at 10:02:24AM +0200, Zdenek Kotala wrote:
>> Background:
>> We specify encoding in initdb phase. ANSI specify repertoire, charset,
>> encoding and collation. If I understand it correctly, then charset is
>> subset of repertoire and specify list of allowed characters for
>> language->collation. Encoding is mapping of character set to binary format.
>> For example for Czech alphabet(charset) we have 6 different encoding for
>> 8bit ASCII, but on other side for UTF8 there is specified multi charsets.
>
> Oh, so you're thinking of a charset as a sort of check constraint. If
> your locale is turkish and you have a column marked charset ASCII then
> storing lower('HI') results in an error.

Yeah, if you use strcoll function it fails when illegal character is found.
See
http://www.opengroup.org/onlinepubs/009695399/functions/strcoll.html

> A collation must be defined over all possible characters, it can't
> depend on the character set. That doesn't mean sorting in en_US must do
> something meaningful with japanese characters, it does mean it can't
> throw an error (the usual procedure is to sort on unicode point).

Collation cannot be defined on any character. There is not any relation between
Latin and Chines characters. Collation has sense when you are able to specify <
= > operators.

If you need compare Japanese and Latin characters then ansi specify default
collation for each repertoire. I think it is usually bitwise comparing.

Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-07-22 14:34:15 Re: pltcl_*mod commands are broken on Solaris 10
Previous Message Valentin Bogdanov 2008-07-22 14:08:32 Re: shared_buffers and shmmax