Re: WIP patch: Collation support

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Radek Strnad <radek(dot)strnad(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP patch: Collation support
Date: 2008-09-10 08:29:14
Message-ID: 48C7855A.40605@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Radek Strnad wrote:
> Progress so far:
> - created catalogs pg_collation a pg_charset which are filled with three
> standard collations
> - initdb changes rows called "DEFAULT" in both catalogs during the bki
> bootstrap phase with current system LC_COLLATE and LC_CTYPE or those set by
> command line.
> - new collations can be defined with command CREATE COLLATION <collation
> name> FOR <character set specification> FROM <existing collation name>
> [STRCOLFN <fn name>]
> [ <pad characteristic> ] [ <case sensitive> ] [ LCCOLLATE <lc_collate> ] [
> LCCTYPE <lc_ctype> ]
> - because of pg_collation and pg_charset are catalogs individual for each
> database, if you want to create a database with collation other than
> specified, create it in template1 and then create database

I have to wonder, is all that really necessary? The feature you're
trying to implement is to support database-level collation at first, and
perhaps column-level collation later. We don't need support for
user-defined collations and charsets for that.

If leave all that out of the patch for now, we'll have a much slimmer,
and just as useful patch, implementing database-level collation. We can
add those catalogs later if we need them, but I don't think there's much
point in adding all that infrastructure if they just reflect the locales
installed in the operating system.

> - when connecting to database, it retrieves locales from pg_database and
> sets them

This is the real gist of this patch.

> Design & functionality changes left:
> - move retrieveing collation from pg_database to pg_type

I don't understand this item. What will you move?

> - get case sensitivity and pad characteristic working

I feel we should leave this to the collation implementation.

> - when creating database with different collation than database cluster, the
> database has to be reindexed. Any idea how to do it? Function
> ReindexDatabase works only when database is opened.

That's a tricky one. One idea is to prohibit choosing a different
collation than the one in the template database, unless we know it's
safe to do so without reindexing. The problem is that we don't know
whether it's safe. A simple but limiting solution would be to require
that the template database has the same collation as the database that's
being created, except that template0 can always be used as template.
template0 is safe, because there's no indexes on text columns there.

Note that we already have the same problem with encodings. If you create
a database with LATIN1 encoding, load it with data, and then use that as
a template for a database with UTF-8 encoding, the text data will be
incorrectly encoded. We should probably fix that too.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2008-09-10 08:36:26 Re: Keeping creation time of objects
Previous Message Simon Riggs 2008-09-10 08:25:07 Re: Synchronous Log Shipping Replication