From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it? |
Date: | 2017-09-22 00:53:05 |
Message-ID: | CAH2-WznOpmJ+3xh6bvea_YUyd4ZdGiwG9ycE31Q09oU3XXw5vA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Sep 19, 2017 at 7:01 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I really think we need to add some kind of debug mode that makes ICU
> optionally spit out a locale display name at key points. We need this
> to gain confidence that the behavior that ICU provides actually
> matches what is expected across ICU different versions for different
> locale formats. We talked about this as a user-facing feature before,
> which can wait till v11; I just want this to debug problems like this
> one.
I patched CREATE COLLATION to show ICU display name, which produces
output like this:
postgres=# create collation basque (provider=icu,
locale='eu-u-kf-upper-kr-latn-digit-em-emoji-kn-true-co-eor');
NOTICE: 00000: ICU collation description is "Basque
(colcasefirst=upper, Sort Order=European Ordering Rules,
colnumeric=yes, colreorder=latn-digit, em=emoji)"
CREATE COLLATION
I used an ISO 639-1 language code (2 letter language code) above,
which, as we can see, is recognized as Basque. ICU is also fine with
the 3 letter 639-2 code "eus-", recognizing that as Basque, too. If I
use an ISO 639-2 code for Basque that ICU/CLDR doesn't like, "baq-*",
I can see that my expectations have only partially been met, since the
notice doesn't say anything about the language Basque:
postgres=# create collation actually_not_basque (provider=icu,
locale='baq-u-kf-upper-kr-latn-digit-em-emoji-kn-true-co-eor');
NOTICE: 00000: ICU collation description is "baq (colcasefirst=upper,
Sort Order=European Ordering Rules, colnumeric=yes,
colreorder=latn-digit, em=emoji)"
CREATE COLLATION
Functionality like this is starting to look essential to me, rather
than just a nice to have. Having this NOTICE would have made me
realize our problems with ICU versions < 54 much earlier, if nothing
else. If the purpose of NOTICE messages is to "Provide[s] information
that might be helpful to users", then I'd say that this definitely
qualifies. And, the extra code is trivial (we already get display name
in the context of initdb). I strongly recommend that we slip this into
v10, as part of fixing the problem with language tags that earlier ICU
versions have.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2017-09-22 01:36:14 | Re: Windows warnings from VS 2017 |
Previous Message | Robert Haas | 2017-09-22 00:35:01 | Re: hash index on unlogged tables doesn't behave as expected |