Re: [WIP] patch - Collation at database level

From: "Radek Strnad" <radek(dot)strnad(at)gmail(dot)com>
To: "Peter Eisentraut" <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [WIP] patch - Collation at database level
Date: 2008-08-02 13:39:18
Message-ID: de5165440808020639j4e7226bbu32e840e225e15c3e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

the main reason why I've submitted the patch was to start a discussion and
know other people's opinion on this problem.

On Tue, Jul 29, 2008 at 10:41 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:

>
> Where are the collations going to come from?

There will be two new catalogs - pg_collate and pg_charset. Each of them
will be filled with ANSI standard collations and charsets (ISO8BIT, LATIN1,
UTF-8..) and alternatively with default collation set when creating. For
instance if you create database cluster with initdb and specify en_US.utf8
there will be standard rows (ISO8BIT, LATIN1, UTF-8..) + one row with
en_US.utf8 in template0. Then you can connect to template0 and create other
collations if your POSIX locales support them and use them one per each
database.

Have the various build and distributions issues been thought about?

Yes. Since POSIX locales doesn't guarantee any collation there will be
hard-coded collations implemented regarding ANSI collation standard. Others
can be set by command CREATE COLLATION.

How are they going to be configured (not the SQL syntax, but how will the
> configuration be applied)?

pg_type, pg_attribute, pg_namespace of each database will be extended with
collation oid column that will be specifying collation.

How are the collations going to be applied at run-time?

Collation will be set when connecting to the database with
setlocale(LC_COLLATION, XXX) and setlocale(LC_CTYPE, XXX)

> How are you going to handle locale and encoding conflicts?

Since I'm currently implementing collation support per database I don't
think this is an issue. (It will be in the future I know.)

> I also think that the clauses you have attached to your CREATE COLLATION
> statement (case-insensitive,
> accent-insensitive) are an oversimplification of reality. I suggest you
> look
> up the Unicode collation algorithm to learn about who collations work in
> practice.

I already did in the very beginning of the development. The reason why I'm
not implementing the whole Unicode collation algorithm is that this patch
shold be sort of framework. You'll be able to use different collation
functions not only POSIX locales so further development towards full Unicode
collation algorithm is possible.

At the end of the next week I'll publish my bachelor thesis concerning this
topic where everything will be explained in details so stay tuned.

Regards

Radek Strnad

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2008-08-02 14:07:28 Re: [WIP] patch - Collation at database level
Previous Message Gregory Stark 2008-08-02 13:18:44 Re: Parsing of pg_hba.conf and authentication inconsistencies