Re: Move defaults toward ICU in 16?

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Move defaults toward ICU in 16?
Date: 2023-02-11 00:17:00
Message-ID: 5ffeafe843d43a975f31479fbd211ab88404841b.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2023-02-08 at 18:22 -0800, Andres Freund wrote:
> On 2023-02-08 12:16:46 -0800, Jeff Davis wrote:
> > On Thu, 2023-02-02 at 18:10 -0500, Tom Lane wrote:
> > > Yeah.  I would be resistant to making ICU a required dependency,
> > > but it doesn't seem unreasonable to start moving towards it being
> > > our default collation support.
> >
> > Patch attached.
>
> Unfortunately this fails widely on CI, with both compile time and
> runtime

New patches attached.

0001: build defaults to requiring ICU
0002: initdb defaults to using ICU (if server built with ICU)

One CI test is failing: "Windows - Server 2019, VS 2019 - Meson &
ninja"; if I apply Andres patch (
https://github.com/anarazel/postgres/commit/dde7c68 ), then it works.

I ran into one annoyance with pg_upgrade, which is that a v15 cluster
initialized with the defaults requires that the v16 cluster is
initialized with --locale-provider=libc, because otherwise the old and
new cluster will have mismatching template databases. Simple to fix
once you see the error, but I wonder how many initdb scripts might be
broken? I suppose it's just the cost of changing a default? Would an
environment variable help for cases where it's difficult to pass that
extra option down through a script?

I also considered posting another patch to change the default for
CREATE COLLATION, but there are a few issues I'm not sure about. Should
the default be based on whether ICU support is available? Or the
datlocprovider for the current database? And/or some kind of
compatibility GUC?

Notes on the tests I needed to fix, in case they are interesting or
point to some kind of larger problem:

* ecpg has a test that involves setting the client_encoding to LATIN1
which required a compatible server encoding so it was setting
ENCODING=SQL_ASCII, which ICU doesn't support. The ecpg test did not
look particularly sensitive to the locale, so I changed it to use
client_encoding=SQL_ASCII instead, so that the server encoding doesn't
matter.
* citext has a test involving Turkish characters, which works for all
libc locales, but in ICU the test only works in Turkish locales. I skip
the test if datlocprovider='i', because citext doesn't seem very
important in an ICU world.
* unaccent is broken if the database provider is ICU and LC_CTYPE=C,
because the t_isspace() (etc.) functions do not properly handle ICU.
Probably some other things are broken with that combination, but only
this test seems to exercise it. I just skipped the test for that broken
combination, but perhaps it should be fixed in the future.
* initdb was being built with ICU as a dependency in meson, but not
autoconf. I assume it's fine to link ICU into initdb, so I changed the
Makefile.
* I changed a couple tests to initialize with --locale-provider=libc.
They were testing that creating a database with the ICU provider but no
ICU locale fails, and that's easiest to test if the template is libc.
* The CI test CompilerWarnings:mingw_cross_warning was failing because
ICU is not available. I added --without-icu in the .cirrus.yml file and
it works.

--
Jeff Davis
PostgreSQL Contributor Team - AWS

Attachment Content-Type Size
v2-0001-Build-ICU-support-by-default.patch text/x-patch 12.9 KB
v2-0002-Use-ICU-by-default-at-initdb-time.patch text/x-patch 17.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-02-11 01:14:53 Re: UUID v7
Previous Message Andrey Borodin 2023-02-10 23:57:50 UUID v7