Re: Move defaults toward ICU in 16?

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Move defaults toward ICU in 16?
Date: 2023-02-02 22:14:40
Message-ID: CA+hUKGLd3ESz2yZ69HB6TO2E9J0Ku_eBJ5AddPHQvW+iPZ-puA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 3, 2023 at 5:31 AM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Thu, 2023-02-02 at 08:44 -0500, Robert Haas wrote:
> > On Thu, Feb 2, 2023 at 8:13 AM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> > > If we don't want to nudge users toward ICU, is it because we are
> > > waiting for something, or is there a lack of consensus that ICU is
> > > actually better?
> >
> > Do you think it's better?
>
> Yes:
>
> * ICU more featureful: e.g. supports case-insensitive collations (the
> citext docs suggest looking at ICU instead).
> * It's faster: a simple non-contrived sort is something like 70%
> faster[1] than one using glibc.
> * It can provide consistent semantics across platforms.

+1

> * Easier for users to control what library version is available on
> their system. We can also ask packagers to keep some old versions of
> ICU available for an extended period of time.
> * If one of the ICU multilib patches makes it in, it will be easier
> for users to select which of the library versions Postgres will use.
> * Reports versions for indiividual collators, distinct from the
> library version.

+1

> The biggest disadvantage (rather, the flip side of its advantages) is
> that it's a separate dependency. Will ICU still be maintained in 10
> years or will we end up stuck maintaining it ourselves? Then again,
> we've already been shipping it, so I don't know if we can avoid that
> problem entirely now even if we wanted to.

It has a pretty special status, with an absolutely enormous amount of
technology depending on it.

http://blog.unicode.org/2016/05/icu-joins-unicode-consortium.html
https://unicode.org/consortium/consort.html
https://home.unicode.org/membership/members/
https://home.unicode.org/about-unicode/

I mean, who knows what the future holds, but ultimately what we're
doing here is taking the de facto reference implementation of the
Unicode collation algorithm. Are Unicode and the consortium still
going to be here in 10 years? We're all in on Unicode, and it's also
tangled up with ISO standards, as are parts of the collation stuff.
Sure, there could be a clean-room implementation that replaces it in
some sense (just as there is a Java implementation) but it would very
likely be "the same" because the real thing we're buying here is the
set of algorithms and data maintenance that the whole industry has
agreed on.

Unless Britain decides to exit the Latin alphabet, terminate
membership of ISO and revert to anglo-saxon runes with a sort order
that is defined in the new constitution as "the opposite of whatever
Unicode says", it's hard to see obstacles to ICU's long term universal
applicability.

It's still important to have libc support as an option, though,
because it's a totally reasonable thing to want sort order to agree
with the "sort" command on the same host, and you are willing to deal
with all the complexities that we're trying to escape.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-02-02 22:22:27 Remove some useless casts to (void *)
Previous Message Nathan Bossart 2023-02-02 22:01:13 Re: Weird failure with latches in curculio on v15