Re: Move defaults toward ICU in 16?

From: "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
To: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Move defaults toward ICU in 16?
Date: 2023-02-14 21:27:50
Message-ID: 46d615da-ede5-ddc4-af50-d71ff1587ec8@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/13/23 8:11 PM, Jeff Davis wrote:
> On Thu, 2023-02-02 at 05:13 -0800, Jeff Davis wrote:
>> As a project, do we want to nudge users toward ICU as the collation
>> provider as the best practice going forward?
>
> One consideration here is security. Any vulnerability in ICU collation
> routines could easily become a vulnerability in Postgres.

Would it be any different than a vulnerability in OpenSSL et al? I know
that's a general, nuanced question but it would be good to understand if
we are exposing ourselves to any more vulnerabilities. And would it be
any different than today, given people can build PG with libicu as is?

Continuing on $SUBJECT, I wanted to understand performance comparisons.
I saw your comments[1] in response to Robert's question, looked at your
benchmarks[2] and one that ICU ran on older versions[3]. It seems that
in general, users would see performance gains switching to ICU. The only
one in [3] that stood out to me was the tests on the "ko_KR" collation
underperformed on a list of Korean names, but maybe that is better in
newer versions.

I agree with most of your points in [1]. The platform-consistent
behavior is a good point, especially with more PG deployments running on
different systems. While taking on a new dependency is a concern, ICU
was released in 1999[4], has an active community, and seems to follow
standards (i.e. the Unicode Consortium).

I do wonder about upgrades, beyond the ongoing work with pg_upgrade. I
think the logical methods (pg_dumpall, logical replication) should
generally be OK, but we should ensure we think of things that could go
wrong and how we'd answer them.

Based on the available data, I think it's OK to move towards ICU as the
default, or preferred, collation provider. I agree (for now) in not
taking a hard dependency on ICU.

Thanks,

Jonathan

[1]
https://www.postgresql.org/message-id/b676252eeb57ab8da9dbb411d0ccace95caeda0a.camel%40j-davis.com
[2]
https://www.postgresql.org/message-id/64039a2dbcba6f42ed2f32bb5f0371870a70afda.camel@j-davis.com
[3] https://icu.unicode.org/charts/collation-icu4c48-glibc
[4] https://en.wikipedia.org/wiki/International_Components_for_Unicode

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Jones 2023-02-14 21:55:20 Re: [PATCH] Add pretty-printed XML output option
Previous Message Andres Freund 2023-02-14 20:47:12 Re: We shouldn't signal process groups with SIGQUIT