Re: Order changes in PG16 since ICU introduction

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Sandro Santilli <strk(at)kbt(dot)io>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Regina Obe <lr(at)pcorp(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Order changes in PG16 since ICU introduction
Date: 2023-05-21 01:42:55
Message-ID: 7de366ed73a6a68c30dd37e32ed740298cfe50b3.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2023-05-19 at 21:13 +0200, Daniel Verite wrote:
> ISTM that if we want to go that route, we need the make the minimum
> changes at the user interface level and not any deeper, so that when
> (locale="C" OR locale="POSIX") AND the provider has not been
> specified,
> then the command (initdb and create database) act as if the user had
> specified provider=libc.

If we special case locale=C, but do nothing for locale=fr_FR, then I'm
not sure we've solved the problem. Andrew Gierth raised the issue here,
which he called "maximally confusing":

https://postgr.es/m/874jp9f5jo.fsf@news-spur.riddles.org.uk

That's why I feel that we need to make locale apply to whatever the
provider is, not just when it happens to be C.

> > (3) Support iculocale=C in the ICU provider using the memcmp()
> > path.
>
> > In other words, if provider=icu and iculocale=C, lc_collate_is_c()
> > and
> > lc_ctpye_is_c() would both return true.
>
> ICU does not provide a locale that behaves like that, and it doesn't
> feel right to pretend it does. It feels like attacking the problem
> at the wrong level.

I agree that #3 feels slightly wrong, but I think it's still a viable
option until we have consensus on something better.

> > (4) Create a new "none" provider (which has no locale and always
> > memcmp
> > semantics), and automatically change the provider to "none" if
> > provider=icu and iculocale=C.
>
> It still uses libc/C for character classification and case changing,
> so "no locale" is technically not true.

The provider affects callers that have a pg_locale_t, such as the SQL-
callable lower() function. For those callers, the "none" provider uses
pg_ascii_tolower(), etc., not libc. That's why I called it "none" --
it's using simple internal postgres implementations instead of a
provider.

For callers that don't have a pg_locale_t, they may call libc functions
directly and rely on the server environment. But in those cases,
there's no way to set a provider at all, it's just relying on the
server environment. There aren't many of these cases, and hopefully we
can eliminate the reliance on the server environment over time.

If I'm missing something, let me know what cases you have in mind.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2023-05-21 03:01:41 LLVM 16 (opaque pointers)
Previous Message Bruce Momjian 2023-05-21 01:33:44 Re: Naming of gss_accept_deleg