Re: Unicode update and some tooling improvements

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unicode update and some tooling improvements
Date: 2026-03-18 14:20:40
Message-ID: lmj5ju4omjr3iswibu477ybipljzzbe4pmnp3oa2rs5gxzanmb@eph27jugatgg
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2026-02-26 21:36:08 +0100, Peter Eisentraut wrote:
> This is the annual update of the Unicode data. I also worked a bit on the
> tooling. The update-unicode target under meson did not update the data in
> contrib/unaccent/, so I added that. I also fixed a Python deprecation
> warning in the generation script and made some light changes in the
> surrounding documentation.

> From ef15b16dcef7a3868fc37744d201bb233f8271bd Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter(at)eisentraut(dot)org>
> Date: Thu, 26 Feb 2026 11:36:27 +0100
> Subject: [PATCH 3/6] Implement unaccent Unicode data update in meson
>
> The meson/ninja update-unicode target did not cover the required
> updates in contrib/unaccent/. This is fixed now.

Makes sesne.

> +# Download CLDR files on demand.
> +
> +cldr_baseurl = 'https://raw.githubusercontent.com/unicode-org/cldr/release-(at)0@/common/transforms/@1@'

Hm. I take it the relevant contents aren't available on unicode.org, which we
use in src/common/unicode?

We reference githubusercontent.com in Makefile too, but somehow that feels a
bit weird.

> +if not wget.found() or not cp.found()
> + subdir_done()
> +endif
> +
> +foreach f : ['Latin-ASCII.xml']
> + # XXX .replace requires meson 0.58
> + url = cldr_baseurl.format(CLDR_VERSION.replace('.', '-'), f)

I think this could be replaced with something like
CLDR_VERSION.split('.').join('-')
for < 0.58 compat. But I'm also ok with going to 0.58.

> From 20d5a665f72b3ddde8bfdf06b94d218da0dc2d09 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter(at)eisentraut(dot)org>
> Date: Thu, 26 Feb 2026 11:38:16 +0100
> Subject: [PATCH 4/6] Update RELEASE_CHANGES
>
> The existing instructions did not cover meson. Point to
> src/common/unicode/README instead, where there is more information.

LGTM.

> From 868e269b518daf0d3d288e2e379d5fd3ad215f49 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter(at)eisentraut(dot)org>
> Date: Thu, 26 Feb 2026 10:25:48 +0100
> Subject: [PATCH 5/6] Update Unicode data to CLDR 48.1
>
> No actual changes result.
>
> XXX should change that to CLDR 49 in April

48.2 has been released from what I can tell.

LGTM otherwise.

> From dd4b5ced419b319c24fa0928180e54d7317e1690 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter(at)eisentraut(dot)org>
> Date: Thu, 26 Feb 2026 11:38:51 +0100
> Subject: [PATCH 6/6] Update Unicode data to Unicode 17.0.0

Looks like 18 is out, any reason to not go straight to that?

> diff --git a/src/Makefile.global.in b/src/Makefile.global.in
> index 7d65e428607..b99116a9ef8 100644
> --- a/src/Makefile.global.in
> +++ b/src/Makefile.global.in
> @@ -376,7 +376,7 @@ DOWNLOAD = wget -O $@ --no-use-server-timestamps
> # Pick a release from here: <https://www.unicode.org/Public/>. Note
> # that the most recent release listed there is often a pre-release;
> # don't pick that one, except for testing.
> -UNICODE_VERSION = 16.0.0
> +UNICODE_VERSION = 17.0.0

Wonder if we, in a separate change, should put UNICODE_VERSION and
CLDR_VERSION version in dedicated files (probably just named
UNICODE_VERSION/CLDR_VERSION) that then could be shared by meson & make.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2026-03-18 14:25:13 Re: Serverside SNI support in libpq
Previous Message Andrei Zubkov 2026-03-18 14:19:03 Re: Vacuum statistics