Re: ICU 54 and earlier are too dangerous

From: Andres Freund <andres(at)anarazel(dot)de>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ICU 54 and earlier are too dangerous
Date: 2023-03-14 01:13:19
Message-ID: 20230314011319.5nofk5nq65in4d7f@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-03-13 16:39:04 -0700, Jeff Davis wrote:
> In ICU 54 and earlier, if ucol_open() is unable to find a matching
> locale, it will fall back to the *environment*.
>
> Using ICU 54:
>
> initdb -D data -N --locale="en_US.UTF-8"
> pg_ctl -D data -l logfile start
> psql postgres -c "create collation asdf(provider=icu, locale='asdf')"
> # returns true
> psql postgres -c "select 'abc' collate asdf < 'ABC' collate asdf"
> psql postgres -c "alter system set lc_messages='C'"
> pg_ctl -D data -l logfile restart
> # returns false and warns about collation version mismatch
> psql postgres -c "select 'abc' collate asdf < 'ABC' collate asdf"
>
> This was fixed in ICU 55 to fall back to the root locale instead[1],
> which is stable, has a collator version, and is not dependent on the
> environment. As far as I can tell, 55 and later never fall back to the
> environment when opening a collator (unless you explicitly pass NULL to
> ucol_open(), which is documented).

> It would be nice if we could detect when this fallback-to-environment
> happens, so that we could just refuse to create the bogus collation.
> But I didn't find a good way. There are non-error return codes from
> ucol_open() that seem promising[2], but they aren't actually very
> useful to distinguish the fallback-to-environment case as far as I can
> tell.

What non-error code is returned in the above example?

Can we query the returned collator and see if it matches what we were looking
for?

> Unless someone has a better idea, I think we need to bump the minimum
> required ICU version to 55. That would solve the issue in v16 and
> later, but those using old versions of ICU and old versions of postgres
> would still be vulnerable to these kinds of typos.

I'm a bit confused by the dates. https://icu.unicode.org/download/55m1 says
that version was released 2014-12-17, but the linked issue around root locales
is from 2018: https://unicode-org.atlassian.net/browse/ICU-10823 - I guess
the issue tracker was migrated at some point or such...

If indeed 2014 is the correct year of release, then it might be ok to increase
the minimum version...

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2023-03-14 01:14:18 Re: psql \watch 2nd argument: iteration count
Previous Message Peter Smith 2023-03-14 00:48:11 Re: Allow logical replication to copy tables in binary format