Re: Order changes in PG16 since ICU introduction

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Order changes in PG16 since ICU introduction
Date: 2023-05-25 16:24:27
Message-ID: a4388fa3acabf7794ac39fdb471ad97eebdfbe11.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2023-05-15 at 12:06 +0200, Peter Eisentraut wrote:
> > === 0002: handle some kinds of libc-stlye locale strings
> >
> > ICU used to handle libc locale strings like 'fr_FR(at)euro', but
> > doesn't
> > in later versions. Handle them in postgres for consistency.
>
> I tend to agree with ICU that these variants are obsolete, and we
> don't
> need to support them anymore.  If this were a tiny patch, then maybe
> ok,
> but the way it's presented here the whole code is duplicated between
> pg_locale.c and initdb.c, which is not great.

I dropped this patch from the series.

> > === 0003: reduce icu_validation_level to WARNING
> >
> > Given that we've seen some inconsistency in which locale names are
> > accepted in different ICU versions, it seems best not to be too
> > strict.
> > Peter Eisentraut suggested that it be set to ERROR originally, but
> > a
> > WARNING should be sufficient to see problems without introducing
> > risks
> > migrating to version 16.
>
> I'm not sure why this is the conclusion.  Presumably, the detection
> capabilities of ICU improve over time, so we want to take advantage
> of
> that?  What are some example scenarios where this change would help?

First of all, I missed this message earlier and I apologize for
proceeding with a commit that contradicted you -- that was not
intentional. The change is small and we can go back if needed.

To restate my reasoning: if we error by default, then changes in ICU
versions can result in errors, which seems too strong to me. I was
hoping to escalate the default for this setting to be "error" down the
road, but it feels like a risk to do so immmediately.

Another thing to consider is that initdb also does validation, and
that's not affected by this GUC. Right now, initdb errors if validation
fails.

> > We've already allowed users to create ICU collations with the C
> > locale
> > in the past, which uses the root collation (not memcmp()), and we
> > need
> > to keep supporting that for upgraded clusters.
>
> I'm not sure I agree that we need to keep supporting that.  The only
> way
> you could get that in past releases is if you specify explicitly,
> "give
> me provider ICU and locale C", and then it wouldn't actually even
> work
> correctly.  So nobody should be using that in practice, and nobody
> should have stumbled into that combination of settings by accident.

OK, then I'm inclined toward the approach to treat iculocale=C with the
memcmp() semantics. Patch added.

I also added a patch with a pg_upgrade check for previous versions with
iculocale=C, to make sure we don't corrupt indexes in case some user
did make that mistake.

> >    3. Introduce collation provider "none", which is always
>
> This seems most attractive, but I think it's quite invasive at this
> point, especially given the dubious premise (see above).

I removed this from the current patch series, and perhaps we should
reconsider it in v17.

> > === 0007: Add a GUC to control the default collation provider
> >
> > Having a GUC would make it easier to migrate to ICU without
> > surprises.
> > This only affects the default for CREATE COLLATION, not CREATE
> > DATABASE
> > (and obviously not initdb).
>
> It's not clear to me why we would want that.  Also not clear why it
> should only affect CREATE COLLATION.

Right now the default for CREATE COLLATION is always libc. For CREATE
DATABASE, it defaults to the template.

I included a patch with a different approach that uses the database
default collation's provider as the default for CREATE COLLATION,
unless LC_COLLATE or LC_CTYPE is specified.

Regards,
Jeff Davis

Attachment Content-Type Size
v6-0001-ICU-support-locale-C-with-the-same-behavior-as-li.patch text/x-patch 11.4 KB
v6-0002-Make-LOCALE-apply-to-ICU_LOCALE-for-CREATE-DATABA.patch text/x-patch 15.2 KB
v6-0003-pg_upgrade-check-for-ICU-locale-C-in-versions-15-.patch text/x-patch 4.7 KB
v6-0004-Use-database-default-collation-s-provider-as-defa.patch text/x-patch 6.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2023-05-25 16:40:16 Re: Docs: Encourage strong server verification with SCRAM
Previous Message PG Bug reporting form 2023-05-25 15:21:26 BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG