Re: Order changes in PG16 since ICU introduction

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Sandro Santilli <strk(at)kbt(dot)io>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Regina Obe <lr(at)pcorp(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Order changes in PG16 since ICU introduction
Date: 2023-05-19 19:13:47
Message-ID: 360c90b9-7c20-4cec-aade-38e6e3351c05@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jeff Davis wrote:

> 2) Automatically change the provider to libc when locale=C.
>
> Almost works, but it's not clear how we handle the case "provider=icu
> lc_collate='fr_FR.utf8' locale=C".
>
> If we change it to "provider=libc lc_collate=C", we've overridden the
> specified lc_collate. If we ignore the locale=C, that would be
> surprising to users. If we throw an error, that would be a backwards
> compatibility issue.

This thread started with a report illustrating that when users mention
the locale "C", they implicitly mean "C" from the libc provider, as
when libc was the default. The problem is that as soon as ICU is the
default, any reference to a libc collation should mention explicitly
that the provider is libc.

It seems what we're set on the idea to create an exception for "C"
(and I assume also "POSIX") to avoid too much confusion, and because
"C" is quite special anyway, and has no equivalent in ICU (the switch
in v16 to ICU as the default provider is based on the premise that the
locales with the same name will behave pretty much the same with ICU
as they did with libc, but it's absolutely not the case with "C").

ISTM that if we want to go that route, we need the make the minimum
changes at the user interface level and not any deeper, so that when
(locale="C" OR locale="POSIX") AND the provider has not been specified,
then the command (initdb and create database) act as if the user had
specified provider=libc.

> (3) Support iculocale=C in the ICU provider using the memcmp() path.

> In other words, if provider=icu and iculocale=C, lc_collate_is_c() and
> lc_ctpye_is_c() would both return true.

ICU does not provide a locale that behaves like that, and it doesn't
feel right to pretend it does. It feels like attacking the problem
at the wrong level.

> (4) Create a new "none" provider (which has no locale and always memcmp
> semantics), and automatically change the provider to "none" if
> provider=icu and iculocale=C.

It still uses libc/C for character classification and case changing,
so "no locale" is technically not true. Personally I don't see
the benefit of adding a "none" provider. C is a libc locale
and libc is not disappearing. I also think that when users explicitly
indicate provider=icu, they should get icu.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-05-19 19:29:16 Re: Assert failure of the cross-check for nullingrels
Previous Message Aleksander Alekseev 2023-05-19 18:42:50 Re: The documentation for READ COMMITTED may be incomplete or wrong