| From: | Andres Freund <andres(at)anarazel(dot)de> |
|---|---|
| To: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
| Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Unicode update and some tooling improvements |
| Date: | 2026-03-18 14:20:40 |
| Message-ID: | lmj5ju4omjr3iswibu477ybipljzzbe4pmnp3oa2rs5gxzanmb@eph27jugatgg |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On 2026-02-26 21:36:08 +0100, Peter Eisentraut wrote:
> This is the annual update of the Unicode data. I also worked a bit on the
> tooling. The update-unicode target under meson did not update the data in
> contrib/unaccent/, so I added that. I also fixed a Python deprecation
> warning in the generation script and made some light changes in the
> surrounding documentation.
> From ef15b16dcef7a3868fc37744d201bb233f8271bd Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter(at)eisentraut(dot)org>
> Date: Thu, 26 Feb 2026 11:36:27 +0100
> Subject: [PATCH 3/6] Implement unaccent Unicode data update in meson
>
> The meson/ninja update-unicode target did not cover the required
> updates in contrib/unaccent/. This is fixed now.
Makes sesne.
> +# Download CLDR files on demand.
> +
> +cldr_baseurl = 'https://raw.githubusercontent.com/unicode-org/cldr/release-(at)0@/common/transforms/@1@'
Hm. I take it the relevant contents aren't available on unicode.org, which we
use in src/common/unicode?
We reference githubusercontent.com in Makefile too, but somehow that feels a
bit weird.
> +if not wget.found() or not cp.found()
> + subdir_done()
> +endif
> +
> +foreach f : ['Latin-ASCII.xml']
> + # XXX .replace requires meson 0.58
> + url = cldr_baseurl.format(CLDR_VERSION.replace('.', '-'), f)
I think this could be replaced with something like
CLDR_VERSION.split('.').join('-')
for < 0.58 compat. But I'm also ok with going to 0.58.
> From 20d5a665f72b3ddde8bfdf06b94d218da0dc2d09 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter(at)eisentraut(dot)org>
> Date: Thu, 26 Feb 2026 11:38:16 +0100
> Subject: [PATCH 4/6] Update RELEASE_CHANGES
>
> The existing instructions did not cover meson. Point to
> src/common/unicode/README instead, where there is more information.
LGTM.
> From 868e269b518daf0d3d288e2e379d5fd3ad215f49 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter(at)eisentraut(dot)org>
> Date: Thu, 26 Feb 2026 10:25:48 +0100
> Subject: [PATCH 5/6] Update Unicode data to CLDR 48.1
>
> No actual changes result.
>
> XXX should change that to CLDR 49 in April
48.2 has been released from what I can tell.
LGTM otherwise.
> From dd4b5ced419b319c24fa0928180e54d7317e1690 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter(at)eisentraut(dot)org>
> Date: Thu, 26 Feb 2026 11:38:51 +0100
> Subject: [PATCH 6/6] Update Unicode data to Unicode 17.0.0
Looks like 18 is out, any reason to not go straight to that?
> diff --git a/src/Makefile.global.in b/src/Makefile.global.in
> index 7d65e428607..b99116a9ef8 100644
> --- a/src/Makefile.global.in
> +++ b/src/Makefile.global.in
> @@ -376,7 +376,7 @@ DOWNLOAD = wget -O $@ --no-use-server-timestamps
> # Pick a release from here: <https://www.unicode.org/Public/>. Note
> # that the most recent release listed there is often a pre-release;
> # don't pick that one, except for testing.
> -UNICODE_VERSION = 16.0.0
> +UNICODE_VERSION = 17.0.0
Wonder if we, in a separate change, should put UNICODE_VERSION and
CLDR_VERSION version in dedicated files (probably just named
UNICODE_VERSION/CLDR_VERSION) that then could be shared by meson & make.
Greetings,
Andres Freund
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Daniel Gustafsson | 2026-03-18 14:25:13 | Re: Serverside SNI support in libpq |
| Previous Message | Andrei Zubkov | 2026-03-18 14:19:03 | Re: Vacuum statistics |