Re: WIP patch: Collation support

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Gregory Stark <stark(at)enterprisedb(dot)com>, Radek Strnad <radek(dot)strnad(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP patch: Collation support
Date: 2008-09-23 09:20:49
Message-ID: 48D8B4F1.6040309@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Committed.

Tom Lane wrote:
> * You should try to get rid of LOCALE_NAME_BUFLEN altogether. Definitely
> the comment about it in pg_control.h is now obsolete.

Yep. I removed LOCALE_NAME_BUFLEN. The real max length of a locale name
is now NAMEDATALEN, because it's stored in a name field in pg_database.
NAMEDATALEN is only 64 bytes, whereas LOCALE_NAME_BUFLEN was 128. 64
bytes should be enough for "en_GB.UTF8" style locale names, but I wonder
if it's enough for the longer names used on Windows? Could someone
confirm that, please?

> An important restriction, however, is that each database's character set
> must be compatible with the database's <envar>LC_CTYPE</> setting.
>
> Also I wonder whether we shouldn't say that it must be compatible with
> LC_CTYPE *and* LC_COLLATE.

I think we should, but that's in fact not what is tested. Before the
patch as well, we only tested that the encoding matches LC_CTYPE, but
you could set LC_COLLATE to anything. I'll work on a subsequent patch to
tighten that.

> * This makes sense, but then shouldn't we make the identical restriction
> for encoding?
>
> + The <literal>COLLATE</> and <literal>CTYPE</> settings must match
> + those of the template database, except when template0 is used as
> + template. This is because <literal>COLLATE</> and <literal>CTYPE</>

It wouldn't be as bullet-proof for encoding, because we'd still have the
problem that the encoding in the shared system tables would be
ill-defined. That's a pre-existing problem, though. We could simply
remove support for per-database encodings altogether and fix it at
initdb time, as Martijn suggest earlier, but now that we have
per-database locales, per-database encodings is a lot more useful as well.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Page 2008-09-23 10:26:23 Re: WIP patch: Collation support
Previous Message Heikki Linnakangas 2008-09-23 09:20:40 pgsql: Make LC_COLLATE and LC_CTYPE database-level settings.