[WIP] collation support revisited (phase 1)

From: "Radek Strnad" <radek(dot)strnad(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: [WIP] collation support revisited (phase 1)
Date: 2008-07-10 21:24:29
Message-ID: de5165440807101424l14fb535byf43fc665351c4dfd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

after long discussion with Mr. Kotala, we've decided to redesign our
collation support proposal.
For those of you who aren't familiar with my WIP patch and comments from
other hackers here's the original mail:
http://archives.postgresql.org/pgsql-hackers/2008-07/msg00019.php

In a few sentences - I'm writing collation support for PostgreSQL that is
almost independent on used collating function. I will implement POSIX
locales but switch to ICU will be quite easy. Collations and character sets
defined by SQL standard will be hard coded so we avoid non-existence in some
functions.

The whole project will be divided into two phases:

phase 1
Implement "sort of framework" so the PostgreSQL will have basic guts
(pg_collation & pg_charset catalogs, CREATE COLLATION, add collation support
for each type needed) and will support collation at database level. This
phase has been accepted as a Google Summer of Code project.

phase 2
Implement the rest - full collation at column level. I will continue working
on this after finishing phase one and it will be my master degree thesis.

How will the first part work?

Catalogs
- new catalogs pg_collation and pg_charset will be defined
- pg_collation and pg_charset will contain SQL standard collations +
optional default collation (when set other than SQL standard one)
- pg_type, pg_attribute, pg_namespace will be extended with references to
default records in pg_collation and pg_charset

initdb
- pg_collation & pg_charset will contain each pre-defined records regarding
SQL standard and optionally one record that will be non-standard set when
creating initdb (the one using system locales)
- these two records will be referenced by pg_type, pg_attribute,
pg_namespace in concerned columns and will be concidered as default
collation that will be inherited

CREATE DATABASE ... COLLATE ...
- after copying the new database the collation will be default (same as
cluster collation) or changed by COLLATE statement. Then we update pg_type,
pg_attribute and pg_namespace catalogs
- reindex database

When changing databases the database collation will be retrieved from type
text from pg_type. This part should be the only one that will be deleted
when proceeding with phase 2. But that will take a while :-)

Thanks for all your comments

Regards

Radek Strnad

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Urbański 2008-07-10 21:26:35 Re: gsoc, text search selectivity and dllist enhancments
Previous Message Michelle Caisse 2008-07-10 21:24:27 Re: Generating code coverage reports