Re: Order changes in PG16 since ICU introduction

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Sandro Santilli <strk(at)kbt(dot)io>, Regina Obe <lr(at)pcorp(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: Order changes in PG16 since ICU introduction
Date: 2023-04-21 19:14:20
Message-ID: 874jp9f5jo.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

>> Also, somewhere along the line someone broke initdb --no-locale,
>> which should result in C locale being the default everywhere, but
>> when I just tested it it picked 'en' for an ICU locale, which is not
>> the right thing.

Tom> Confirmed:

Tom> $ LANG=en_US.utf8 initdb --no-locale
Tom> The files belonging to this database system will be owned by user "postgres".
Tom> This user must also own the server process.

Tom> Using default ICU locale "en_US".
Tom> Using language tag "en-US" for ICU locale "en_US".
Tom> The database cluster will be initialized with this locale configuration:
Tom> provider: icu
Tom> ICU locale: en-US
Tom> LC_COLLATE: C
Tom> LC_CTYPE: C
Tom> ...

Tom> That needs to be fixed: --no-locale should prevent any
Tom> consideration of initdb's LANG/LC_foo environment.

Would it also not make sense to also take into account any --locale and
--lc-* options before choosing an ICU default locale? Right now if you
do, say, initdb --locale=fr_FR you get an ICU locale based on the
environment but lc_* settings based on the option, which seems maximally
confusing.

Also, what happens now to lc_collate_is_c() when the provider is ICU? Am
I missing something, or is it never true now, even if you specified C /
POSIX / en-US-u-va-posix as the ICU locale? This seems like it could be
an important pessimization.

Also also, we now have the problem that it is much harder to create a
'C' collation database within an existing cluster (e.g. for testing)
without knowing whether the default provider is ICU. In the past one
would have done:

CREATE DATABASE test TEMPLATE=template0 ENCODING = 'UTF8' LOCALE = 'C';

but now that creates a database that uses the same ICU locale as
template0 by default. If instead one tries:

CREATE DATABASE test TEMPLATE=template0 ENCODING = 'UTF8' LOCALE = 'C' ICU_LOCALE='C';

then one gets an error if the default locale provider is _not_ ICU. The
only option now seems to be:

CREATE DATABASE test TEMPLATE=template0 ENCODING = 'UTF8' LOCALE = 'C' LOCALE_PROVIDER = 'libc';

which of course doesn't work in older pg versions.

--
Andrew.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Sandro Santilli 2023-04-21 19:17:25 Re: Order changes in PG16 since ICU introduction
Previous Message Sandro Santilli 2023-04-21 19:14:13 Re: Order changes in PG16 since ICU introduction