Re: Can ICU be used for a database's default sort order?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Can ICU be used for a database's default sort order?
Date: 2017-06-23 23:14:13
Message-ID: 17334.1498259653@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Geoghegan <pg(at)bowt(dot)ie> writes:
> On Fri, Jun 23, 2017 at 11:32 AM, Peter Eisentraut
> <peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
>> 1) Associate by name only. That is, you can create a database with any
>> COLLATION "foo" that you want, and it's only checked when you first
>> connect to or do anything in the database.
>>
>> 2) Create shared collations. Then we'd need a way to manage having a
>> mix of shared and non-shared collations around.
>>
>> There are significant pros and cons to all of these ideas. Some people
>> I talked to appeared to prefer the shared collations approach.

> I strongly prefer the second approach. The only downside that occurs
> to me is that that approach requires more code. Is there something
> that I've missed?

I'm not very clear on how you'd bootstrap template1 into anything
other than C locale in the second approach. With our existing
libc-based stuff, it's possible to define what the database's locale
is before there are any catalogs. It's not apparent how to do that with
a collation-based solution.

In my mind, collations are just a SQL-syntax wrapper for locales that
are really defined one level down. I think we'd be well advised to
carry that same approach into the database properties, because otherwise
we have circularities to deal with. So I'm imagining something more like

create database encoding 'utf8' lc_collate 'icu-en_US' lc_ctype ...

where lc_collate is just a string that we know how to interpret, the
same as now.

We could optionally reduce the amount of notation involved by merging the
lc_collate and lc_ctype parameters into one, say

create database encoding 'utf8' locale 'icu-en_US' ...

I'm not too clear on how this would play with other libc locale
functionality (lc_monetary and so on), but we'd have to deal with
that question anyway.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-06-23 23:32:33 Re: Small bug in replication lag tracking
Previous Message Michael Paquier 2017-06-23 23:09:52 Re: pg_terminate_backend can terminate background workers and autovacuum launchers