Re: Patch to address concerns about ICU collcollate stability in v10 (Was: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?)

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Patch to address concerns about ICU collcollate stability in v10 (Was: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?)
Date: 2017-09-25 19:24:44
Message-ID: CAH2-WzmutFeBY4uPMdWuRKXt5RxA7zt7jE93e+sGfvKqqRLK3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 25, 2017 at 11:42 AM, Peter Eisentraut
<peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
> On 9/25/17 00:24, Peter Geoghegan wrote:
>> * Creates root collation as root-x-icu (collcollate "root"), not
>> und-x-icu. "und" means undefined language.
>
> I'm curious about this point. "und" is defined in BCP 47. I don't see
> "root" defined anywhere. ICU converts the root collation to "und",
> AFAIK, so it seems to agree with the current naming.

In my patch, "root" is a string that is passed to get a language tag.
That's technically in the old format.

I think that this is another ICU vs. UCA/CLDR thing (this causes much
confusion). Note that "root" is mentioned in the ICU locale explorer,
for example: https://ssl.icu-project.org/icu-bin/locexp

Note also that ucol_open() comments/docs say this:

* @param loc The locale containing the required collation rules.
* Special values for locales can be passed in -
* if NULL is passed for the locale, the default locale
* collation rules will be used. If empty string ("") or
* "root" are passed, UCA rules will be used.

I went with "root" because that produces a meaningful/useful display
name for pg_collation, and seems to be widely used elsewhere.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2017-09-25 19:38:56 Re: Row Level Security Documentation
Previous Message Christopher Browne 2017-09-25 19:22:12 Re: Built-in plugin for logical decoding output