Re: Order changes in PG16 since ICU introduction

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Sandro Santilli <strk(at)kbt(dot)io>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Regina Obe <lr(at)pcorp(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Order changes in PG16 since ICU introduction
Date: 2023-04-28 21:35:25
Message-ID: 654a49f7ff7461bcf47be4181430678d45f93858.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2023-04-27 at 14:23 +0200, Daniel Verite wrote:
> This should be pg_strcasecmp(...) == 0

Good catch, thank you! Fixed in updated patches.

> postgres=# create database lat9 locale 'fr_FR(at)euro' encoding LATIN9
> template
> 'template0';
> ERROR:  could not convert locale name "fr_FR(at)euro" to language tag:
> U_ILLEGAL_ARGUMENT_ERROR

ICU 63 and earlier convert it without error to the language tag 'fr-FR-
u-cu-eur', which is correct. ICU 64 removed support for transforming
some locale variants, because apparently they think those variants are
obsolete:

https://unicode-org.atlassian.net/browse/ICU-22268
https://unicode-org.atlassian.net/browse/ICU-20187

(Aside: how obsolete are those variants?)

It's frustrating that they'd remove such transformations from the
canonicalization process.

Fortunately, it looks like it's easy enough to do the transformation
ourselves. The only problematic format is '(dot)(dot)(dot)(at)VARIANT'. The other
format 'fr_FR_EURO' doesn't seem to be a valid glibc locale name[1] and
windows seems to use BCP 47[2].

And there don't seem to be a lot of variants to handle. ICU 63 only
handles 3 variants, so that's what my patch does. Any unknown variant
between 5 and 8 characters won't throw an error. There could be more
problem cases, but I'm not sure how much of a practical problem they
are.

If we try to keep the meaning of LOCALE to only LC_COLLATE and
LC_CTYPE, that will continue to be confusing for anyone that uses
provider=icu.

Regards,
Jeff Davis

[1]
https://www.gnu.org/software/libc/manual/html_node/Locale-Names.html
[2]
https://learn.microsoft.com/en-us/windows/win32/intl/locale-names

Attachment Content-Type Size
v3-0001-ICU-do-not-convert-locale-C-to-en-US-u-va-posix.patch text/x-patch 6.9 KB
v3-0002-ICU-support-locale-C-with-the-same-behavior-as-li.patch text/x-patch 12.2 KB
v3-0003-ICU-fix-up-old-libc-style-locale-strings.patch text/x-patch 9.6 KB
v3-0004-Make-LOCALE-apply-to-ICU_LOCALE-for-CREATE-DATABA.patch text/x-patch 16.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2023-04-29 08:09:13 Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound
Previous Message Roberto Mello 2023-04-28 18:22:34 Re: Postgres Version want to update from 9.2 to 9.5 version in CentOS 7.9