Re: Move defaults toward ICU in 16?

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Move defaults toward ICU in 16?
Date: 2023-02-02 22:48:05
Message-ID: CAH2-Wzmu2PBPWaYHqeogG8ZL426uABdHRYL-tex1Y5iPWfTNmw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 2, 2023 at 2:15 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> Sure, there could be a clean-room implementation that replaces it in
> some sense (just as there is a Java implementation) but it would very
> likely be "the same" because the real thing we're buying here is the
> set of algorithms and data maintenance that the whole industry has
> agreed on.

I don't think that a clean room implementation is implausible. They
seem to already exist, and be explicitly provided for by CLDR, which
is not joined at the hip to ICU:

https://github.com/elixir-cldr/cldr

Most of the value that we tend to think of as coming from ICU actually
comes from CLDR itself, as well as related Unicode Consortium and IETF
standards/RFCs such as BCP-47.

> Unless Britain decides to exit the Latin alphabet, terminate
> membership of ISO and revert to anglo-saxon runes with a sort order
> that is defined in the new constitution as "the opposite of whatever
> Unicode says", it's hard to see obstacles to ICU's long term universal
> applicability.

It would have to literally be defined as "not unicode" for it to
present a real problem. A key goal of Unicode is to accommodate
political and cultural shifts, since even countries can come and go.
In principle Unicode should be able to accommodate just about any
change in preferences, except when there is an irreconcilable
difference of opinion among people that are from the same natural
language group. For example it can accommodate relatively minor
differences of opinion about how text should be sorted among groups
that each speak a regional dialect of the same language. Hardly
anybody even notices this.

Accommodating these variations can only come from making a huge
investment. Most of the work is actually done by natural language
scholars, not technologists. That effort is very unlikely to be
duplicated by some other group with its own conflicting goals. AFAICT
there is no great need for any schisms, since differences of opinion
can usually be accommodated under the umbrella of Unicode.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-02-02 23:10:49 Re: Move defaults toward ICU in 16?
Previous Message Peter Eisentraut 2023-02-02 22:39:59 Re: Underscores in numeric literals