Re: ICU locale validation / canonicalization

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ICU locale validation / canonicalization
Date: 2023-03-22 18:05:31
Message-ID: c6dd63de8267ad8267ce4ec939f72c43f1fc2020.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2023-03-21 at 10:35 +0100, Peter Eisentraut wrote:
> [PATCH v6 1/6] Support language tags in older ICU versions (53 and
>   earlier).
>
> In pg_import_system_collations(), this is now redundant and can be
> simplified:
>
> -               if (!pg_is_ascii(langtag) || !pg_is_ascii(iculocstr))
> +               if (!pg_is_ascii(langtag) || !pg_is_ascii(langtag))
>
> icu_set_collation_attributes() needs more commenting about what is
> going
> on.  My guess is that uloc_canonicalize() converts from language tag
> to
> ICU locale ID, and then the existing logic to parse that apart would
> apply.  Is that how it works?

Fixed the redundancy, added some comments, and committed 0001.

> [PATCH v6 2/6] Wrap ICU ucol_open().
>
> It makes sense to try to unify some of this.  But I find the naming
> confusing.  If I see pg_ucol_open(), then I would expect that all
> calls
> to ucol_open() would be replaced by this.  But here it's only a few,
> without explanation.  (pg_ucol_open() has no explanation at all
> AFAICT.)

The remaining callsite which doesn't use the wrapper is in initdb.c,
which can't call into pg_locale.c, and has different intentions. initdb
uses ucol_open to get the default locale if icu_locale is not
specified; and it also uses ucol open to verify that the locale can be
opened (whether specified or the default). (Aside: I created a tiny
0004 patch which makes this difference more clear and adds a nice
comment.)

There's no reason to use a wrapper when getting the default locale,
because it's just passing NULL anyway.

When verifying that the locale can be opened, ucol_open() doesn't catch
many problems anyway, so I'm not sure it's worth a lot of effort to
copy these extra checks that the wrapper does into initdb.c. For
instance, what's the value in replacing "und" with "root" if opening
either will succeed? Parsing the attributes can potentially catch
problems, but the later patch 0006 will check the attributes when
converting to a language tag at initdb time.

So I'm inclined to just leave initdb alone in patches 0002 and 0003.

> I have in my notes that check_icu_locale() and make_icu_collator()
> should be combined into a single function.  I think that would be a
> better way to slice it.

That would leave out get_collation_actual_version(), which should
handle the same fixups for attributes and the "und" locale.

> Btw., I had intentionally not written code like this
>
> +#if U_ICU_VERSION_MAJOR_NUM < 54
> +       icu_set_collation_attributes(collator, loc_str);
> +#endif
>
> The disadvantage of doing it that way is that you then need to dig
> out
> an old version of ICU in order to check whether the code compiles at
> all.  With the current code, you can be sure that that code compiles
> if
> you make changes elsewhere.

I was wondering about that -- thank you, I changed it back to use "if"
rather than "#ifdef".

New series attached (starting at 0002 to better correspond to the
previous series).

--
Jeff Davis
PostgreSQL Contributor Team - AWS

Attachment Content-Type Size
v7-0002-Wrap-ICU-ucol_open.patch text/x-patch 4.0 KB
v7-0003-Handle-the-und-locale-in-ICU-versions-54-and-olde.patch text/x-patch 5.6 KB
v7-0004-Accept-C-POSIX-locales-when-converting-to-languag.patch text/x-patch 10.7 KB
v7-0005-initdb-emit-message-when-using-default-ICU-locale.patch text/x-patch 3.3 KB
v7-0006-Canonicalize-ICU-locale-names-to-language-tags.patch text/x-patch 17.0 KB
v7-0007-Validate-ICU-locales.patch text/x-patch 9.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Luzanov 2023-03-22 18:11:16 Re: psql: Add role's membership options to the \du+ command
Previous Message Jeff Davis 2023-03-22 18:05:03 Re: Request for comment on setting binary format output per session