Re: Create collation reporting the ICU locale display name

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Create collation reporting the ICU locale display name
Date: 2019-09-14 20:46:01
Message-ID: CAH2-Wzmo3jt6h0BEBYxDfxMJ+pcg7eCJxR3PNpg0XMsBap+iaQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 14, 2019 at 8:13 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> The advantage of describe_collation(oid) is that we would not be
> building knowledge into the callers about which columns of pg_collation
> matter for this purpose. I'm not even convinced that the two you posit
> here are sufficient --- the encoding seems relevant, for instance.

+1. It seems like a good idea to consider the ICU display name to be
just that -- a display name. It should be considered a dynamic thing.
For one thing, it is subject to localization, so it isn't fixed even
when nothing changes internally. But there is also the question of
external changes. Internationalization is inherently a squishy
business.

I believe that the main goal of BCP 47 (i.e. ICU's CREATE COLLATION
locale strings) is to fail gracefully when cultural or political
developments occur that change the expectations of users. BCP 47 is
actually an IETF standard -- it's not from the Unicode consortium, or
from ICU. It is supposed to be highly forgiving -- this is a feature,
not a bug. Of course, many facets of a locale control things that we
don't care about, or at least don't involve ICU with. For example,
locale controls the default currency symbol.

There are pg_upgrade scenarios in which the display string for a
collation will legitimately change due to external changes. For
example, somebody that lived in Serbia and Montenegro (a country which
ceased to exist in 2006) could have used a locale string with "cs" (an
ISO 3166-1 code), which has been deprecated [1]. If memory serves,
there is a 5 year grace period codified by some ISO standard or other,
so recent ICU versions know nothing about Serbia and Montenegro
specifically. But they'll still recognize the Serbian language code,
as well as language codes for minority languages spoken in Serbia and
Montenegro. So, for the most part, the impact of sticking with this
old/somewhat inaccurate locale definition string is minimal.
(Actually, maybe downgrade scenarios are more interesting in
practice.)

[1] https://en.wikipedia.org/wiki/ISO_3166-2:CS#Codes_deleted_in_Newsletter_I-8
--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-09-14 21:09:10 Re: Extending range type operators to cope with elements
Previous Message Thomas Rosenstein 2019-09-14 20:14:53 Re: Standby Replication and Replication Delay