Re: Order changes in PG16 since ICU introduction

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Regina Obe <lr(at)pcorp(dot)us>, 'Peter Eisentraut' <peter(dot)eisentraut(at)enterprisedb(dot)com>, 'Sandro Santilli' <strk(at)kbt(dot)io>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Order changes in PG16 since ICU introduction
Date: 2023-04-25 04:31:40
Message-ID: d321c19850a236e76850f00437a34b8f8f1abb90.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2023-04-21 at 16:00 -0400, Tom Lane wrote:
> I think I might like this idea, except for one thing: you're
> imagining
> that the locale doesn't control anything except string comparisons.
> What about to_upper/to_lower, character classifications in regexes,
> etc?

If provider='libc' and LC_CTYPE='C', str_toupper/str_tolower are
handled with asc_tolower/asc_toupper. The regex character
classification is done with pg_char_properties. In these cases neither
ICU nor libc is used; it's just code in postgres.

libc is special in that you can set LC_COLLATE and LC_CTYPE separately,
so that different locales are used for sorting and character
classification. That's potentially useful to set LC_COLLATE to C for
performance reasons, while setting LC_CTYPE to a useful locale. We
don't allow ICU to set collation and ctype separately (it would be
possible to allow it, but I don't think there's a huge demand and it's
arguably inconsistent to set them differently).

> (I'm not sure whether those operations can get redirected to ICU
> today
> or whether they still always go to libc, but we'll surely want to fix
> it eventually if the latter is still true.)

Those operations do get redirected to ICU today. There are extensions
that call locale-sensitive libc functions directly, and obviously those
won't use ICU.

> Aside from the user-surprise issues discussed up to now, pg_dump
> scripts
> emitted by pre-v15 pg_dump are not going to contain LOCALE_PROVIDER
> clauses in CREATE DATABASE, and people are going to be very unhappy
> if that means they suddenly get totally different locale semantics
> after restoring into a new DB.

Agreed.

>   I think we need some plan for mapping
> libc-style locale specs into ICU locales so that we can make that
> more nearly transparent.

ICU does a reasonable job mapping libc-like locale names to ICU
locales, e.g. en_US to en-US, etc. The ordering semantics aren't
guaranteed to be the same, of course (because the libc-locales are
platform-dependent), but it's at least conceptually the same locale.

>
> Maybe this means we are not ready to do ICU-by-default in v16.
> It certainly feels like there might be more here than we want to
> start designing post-feature-freeze.

This thread is already on the Open Items list. As long as it's not too
disruptive to others I'll leave it as-is for now to see how this sorts
out. Right now it's not clear to me how much of this is a v15 issue vs
a v16 issue.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-04-25 04:43:48 Re: Add two missing tests in 035_standby_logical_decoding.pl
Previous Message Amit Kapila 2023-04-25 04:23:25 Re: Add two missing tests in 035_standby_logical_decoding.pl