Re: Per-column collation, work in progress

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Per-column collation, work in progress
Date: 2010-09-23 09:03:03
Message-ID: 1285232583.27917.11.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On tor, 2010-09-23 at 10:12 +0200, Pavel Stehule wrote:
> 1. It's doesn't work with SQL 92 rules for sortby list. I can
> understand so explicit COLLATE using doesn't work, but the implicit
> using doesn't work too:
>
> CREATE TABLE foo(a text, b text COLLATE "cs_CZ.UTF8")
>
> SELECT * FROM foo ORDER BY 1 -- produce wrong order

I can't reproduce that. Please provide more details.

> 2. Why default encoding for collate is static? There are latin2 for
> czech, cs_CZ and cs_CZ.iso88592. So any user with UTF8 has to write
> encoding explicitly. But the more used and preferred encoding is UTF8
> now. I am thinking so cs_CZ on utf8 database should mean cs_CS.UTF8.

That's tweakable. One idea I had is to strip the ".utf8" suffix from
locale names when populating the pg_collation catalog, or create both
versions. I agree that the current way is a bit cumbersome.

> 3. postgres=# select to_char(current_date,'tmday') collate "cs_CZ.utf8";
> to_char
> ──────────
> thursday -- bad result
> (1 row)

As was already pointed out, collation only covers lc_collate and
lc_ctype. (It could cover other things, for example an application to
the money type was briefly discussed, but that's outside the current
mandate.)

As a point of order, what you wrote above attaches a collation to the
result of the function call. To get the collation to apply to the
function call itself, you have to put the collate clause on one of the
arguments, e.g.,

select to_char(current_date,'tmday' collate "cs_CZ.utf8");

> 4. is somewhere ToDo for collation implementation?

At the moment it's mostly in the source code. I have a list of notes
locally that I can clean up and put in the wiki once we agree on the
general direction.

> 5.
>
> postgres=# create table xy(a text, b text collate "cs_CZ");
> ERROR: collation "cs_CZ" for current database encoding "UTF8" does not exist
>
> can be there some more friendly message or hint ? like "you cannot to
> use a different encoding". This collate is in pg_collates table.

That can surely be polished.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2010-09-23 09:10:01 Re: Per-column collation, work in progress
Previous Message Heikki Linnakangas 2010-09-23 09:02:50 Re: Configuring synchronous replication