ICU 54 and earlier are too dangerous

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: ICU 54 and earlier are too dangerous
Date: 2023-03-13 23:39:04
Message-ID: ea927ede4e8a8f3ba515b15a083577a68e9f9201.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


In ICU 54 and earlier, if ucol_open() is unable to find a matching
locale, it will fall back to the *environment*.

Using ICU 54:

initdb -D data -N --locale="en_US.UTF-8"
pg_ctl -D data -l logfile start
psql postgres -c "create collation asdf(provider=icu, locale='asdf')"
# returns true
psql postgres -c "select 'abc' collate asdf < 'ABC' collate asdf"
psql postgres -c "alter system set lc_messages='C'"
pg_ctl -D data -l logfile restart
# returns false and warns about collation version mismatch
psql postgres -c "select 'abc' collate asdf < 'ABC' collate asdf"

This was fixed in ICU 55 to fall back to the root locale instead[1],
which is stable, has a collator version, and is not dependent on the
environment. As far as I can tell, 55 and later never fall back to the
environment when opening a collator (unless you explicitly pass NULL to
ucol_open(), which is documented).

It would be nice if we could detect when this fallback-to-environment
happens, so that we could just refuse to create the bogus collation.
But I didn't find a good way. There are non-error return codes from
ucol_open() that seem promising[2], but they aren't actually very
useful to distinguish the fallback-to-environment case as far as I can
tell.

Unless someone has a better idea, I think we need to bump the minimum
required ICU version to 55. That would solve the issue in v16 and
later, but those using old versions of ICU and old versions of postgres
would still be vulnerable to these kinds of typos.

Regards,
Jeff Davis

[1] https://icu.unicode.org/download/55m1
[2]
https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/utypes_8h.html#a3343c1c8a8377277046774691c98d78c

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-03-13 23:39:12 Re: pg_dump versus hash partitioning
Previous Message Michael Paquier 2023-03-13 23:32:12 Re: Combine pg_walinspect till_end_of_wal functions with others