Re: Order changes in PG16 since ICU introduction

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Sandro Santilli <strk(at)kbt(dot)io>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Regina Obe <lr(at)pcorp(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Order changes in PG16 since ICU introduction
Date: 2023-05-22 20:09:00
Message-ID: cb448574-aa7c-4969-b2dd-c9eb221d7e06@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jeff Davis wrote:

> If we special case locale=C, but do nothing for locale=fr_FR, then I'm
> not sure we've solved the problem. Andrew Gierth raised the issue here,
> which he called "maximally confusing":
>
> https://postgr.es/m/874jp9f5jo.fsf@news-spur.riddles.org.uk
>
> That's why I feel that we need to make locale apply to whatever the
> provider is, not just when it happens to be C.

While I agree that the LOCALE option in CREATE DATABASE is
counter-intuitive, I find it questionable that blending ICU
and libc locales into it helps that much with the user experience.

Trying the lastest v6-* patches applied on top of 722541ead1
(before the pgindent run), here are a few examples when I
don't think it goes well.

The OS is Ubuntu 22.04 (glibc 2.35, ICU 70.1)

initdb:

Using default ICU locale "fr".
Using language tag "fr" for ICU locale "fr".
The database cluster will be initialized with this locale configuration:
provider: icu
ICU locale: fr
LC_COLLATE: fr_FR.UTF-8
LC_CTYPE: fr_FR.UTF-8
LC_MESSAGES: fr_FR.UTF-8
LC_MONETARY: fr_FR.UTF-8
LC_NUMERIC: fr_FR.UTF-8
LC_TIME: fr_FR.UTF-8
The default database encoding has accordingly been set to "UTF8".

#1

postgres=# create database test1 locale='fr_FR.UTF-8';
NOTICE: using standard form "fr-FR" for ICU locale "fr_FR.UTF-8"
ERROR: new ICU locale (fr-FR) is incompatible with the ICU locale of the
template database (fr)
HINT: Use the same ICU locale as in the template database, or use template0
as template.

That looks like a fairly generic case that doesn't work seamlessly.

#2

postgres=# create database test2 locale='C.UTF-8' template='template0';
NOTICE: using standard form "en-US-u-va-posix" for ICU locale "C.UTF-8"
CREATE DATABASE

en-US-u-va-posix does not sort like C.UTF-8 in glibc 2.35, so
this interpretation is arguably not what a user would expect.

I would expect the ICU warning or error (icu_validation_level) to kick
in instead of that transliteration.

#3

$ grep french /etc/locale.alias
french fr_FR.ISO-8859-1

postgres=# create database test3 locale='french' template='template0'
encoding='LATIN1';
WARNING: ICU locale "french" has unknown language "french"
HINT: To disable ICU locale validation, set parameter icu_validation_level
to DISABLED.
CREATE DATABASE

In practice we're probably getting the "und" ICU locale whereas "fr" would
be appropriate.

I assume that we would find more cases like that if testing on many
operating systems.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-05-22 20:18:28 Re: PG 16 draft release notes ready
Previous Message MARK CALLAGHAN 2023-05-22 19:40:25 Re: benchmark results comparing versions 15.2 and 16