Re: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: Peter Geoghegan <pg(at)bowt(dot)ie>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?
Date: 2017-09-21 09:49:36
Message-ID: be9f0a2c-98dc-3915-6e1b-85a1cf1c0d8a@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/21/2017 01:40 AM, Peter Geoghegan wrote:
> On Wed, Sep 20, 2017 at 4:08 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>>> pg_import_system_collations() takes care to use the non-BCP-47 style for
>>> such versions, so I think this is working correctly.
>>
>> But CREATE COLLATION doesn't use pg_import_system_collations().
>
> And perhaps more to the point: it highly confusing that we use one or
> the other of those 2 things ("langtag"/BCP 47 tag or "name"/legacy
> locale name) as "colcollate", depending on ICU version, thereby
> *behaving* as if ICU < 54 really didn't know anything about BCP 47
> tags. Because, obviously earlier ICU versions know plenty about BCP
> 47, since 9 lines further down we use "langtag"/BCP 47 tag as collname
> for CollationCreate() -- regardless of ICU version.
>
> How can you say "ICU <54 doesn't even support the BCP 47 style", given
> all that? Those versions will still have locales named "*-x-icu" when
> users do "\dOS". Users will be highly confused when they quite
> reasonably try to generalize from the example in the docs and what
> "\dOS" shows, and get results that are wrong, often only in a very
> subtle way.

If we are fine with supporting only ICU 4.2 and later (which I think we
are given that ICU 4.2 was released in 2009) then using
uloc_forLanguageTag()[1] to validate and canonize seems like the right
solution. I had missed that this function even existed when I last read
the documentation. Does it return a BCP 47 tag in modern versions of ICU?

I strongly prefer if there, as much as possible, is only one format for
inputting ICU locales.

1.
http://www.icu-project.org/apiref/icu4c/uloc_8h.html#aa45d6457f72867880f079e27a63c6fcb

Andreas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?= 2017-09-21 09:53:17 Re: coverage analysis improvements
Previous Message Julien Rouhaud 2017-09-21 09:13:23 Re: [Proposal] Make the optimiser aware of partitions ordering