Re: Order changes in PG16 since ICU introduction

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Order changes in PG16 since ICU introduction
Date: 2023-06-07 21:50:57
Message-ID: 8bdda206-70af-4645-b8f8-e55386a4db94@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jeff Davis wrote:

> The locale "C" is a special case, documented as a non-locale. So, if
> LOCALE/--locale apply to ICU, then either ICU needs to handle locale
> "C" in the expected way (v8 patch series); or when we see locale "C" we
> need to somehow change the provider into something that can handle it
> (v6 patch series changes it to the "none" provider).

Yes it's a special case but when doing initdb --locale=C, a user does
not need or want an ICU locale. They want the same thing than what v15
does with the same arguments: a template0 database with
datlocprovider='c', datcollate='C', datctype='C', dateiculocale=NULL.

The simplest way to obtain that in v16 is to teach initdb that
--locale=C without the --locale-provider option implies that
--locale-provider=libc ([1])

OTOH what you're proposing with the 0001 patch is much more involved
in terms of tweaking the ICU code so that dateiculocale='C' and
datlocprovider='i' becomes a combination that provides the C semantics
that ICU doesn't have natively.

I don't agree with the reasoning that to make progress with
the other uses of --locale, we need to start by either making ICU
support C/POSIX (0001/0002), or add a new "none/builtin" provider
(previous set of patches).
v16 should not need it any more than v15 did, if v16 does the same as
v15 on locale='C', that is not involve ICU at all.

> Then that enables us to change LOCALE/--locale to apply to ICU, which
> means that a simple command like "initdb --locale=en_US" does a
> sensible thing regardless of the default provider.
>
> I understand you are skeptical of trying to apply an arbitrary locale
> name to ICU, but if they don't specify the provider, what do you expect
> to happen?

It's a hard question because it depends on what people have in their
locale environment combined with what they try to do.
I think that initdb without any locale option should work well in
the majority of environments, but specifying a locale alone will not work
well in a number of cases, so users might end up concluding that they
need to specify not only the provider but lc_collate/lc_ctype.

[1]
https://www.postgresql.org/message-id/360c90b9-7c20-4cec-aade-38e6e3351c05@manitou-mail.org

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-06-07 21:58:09 Re: Let's make PostgreSQL multi-threaded
Previous Message Andres Freund 2023-06-07 21:48:22 Re: Let's make PostgreSQL multi-threaded