Re: Order changes in PG16 since ICU introduction

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Regina Obe <lr(at)pcorp(dot)us>
Cc: 'Peter Eisentraut' <peter(dot)eisentraut(at)enterprisedb(dot)com>, 'Sandro Santilli' <strk(at)kbt(dot)io>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Order changes in PG16 since ICU introduction
Date: 2023-04-21 19:25:00
Message-ID: f83f089ee1e9acd5dbbbf3353294d24e1f196e95.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2023-04-21 at 14:23 -0400, Tom Lane wrote:
> postgres=# CREATE DATABASE test1 TEMPLATE=template0 ENCODING = 'UTF8'
> LOCALE = 'C';

...

>  test1     | postgres | UTF8     | icu             | C          |
> C          | en-US      |           |
> (4 rows)
>
> Looks like the "pick en-US even when told not to" problem exists here
> too.

Both provider (ICU) and the icu locale (en-US) are inherited from
template0. The LOCALE parameter to CREATE DATABASE doesn't affect
either of those things, because there's a separate parameter
ICU_LOCALE.

This happens the same way in v15, and although it matches the
documentation technically, it is not a great user experience.

I have a couple ideas:

1. Introduce a "none" provider to separate the concept of C/POSIX
locales from the libc provider. It's not really using a provider
anyway, it's just using memcmp(), and I think it causes confusion to
combine them. Saying "LOCALE_PROVIDER=none" is less error-prone than
"LOCALE_PROVIDER=libc LOCALE='C'".

2. Change the CREATE DATABASE syntax to catch these errors better at
the possible expense of backwards compatibility.

I am also having second thoughts about accepting "C" or "POSIX" as an
ICU locale and transforming it to "en-US-u-va-posix" in v16. It's not
terribly useful (why not just use memcmp()?), it's not fast in my
measurements (en-US is faster), so maybe it's better to just throw an
error and tell the user to use C (or provider=none as I suggest
above)? 

Obviously the user could manually type "en-US-u-va-posix" if that's the
locale they want. Throwing an error would be a backwards-compatibility
issue, but in v15 an ICU locale of "C" just gives the root locale
anyway, which is probably not what they want.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Zhang 2023-04-21 19:56:01 Re: Add missing copyright for pg_upgrade/t/* files
Previous Message Konstantin Knizhnik 2023-04-21 19:18:49 Re: Memory leak from ExecutorState context?