Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Daniel Verite <daniel(at)manitou-mail(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)
Date: 2017-08-08 01:00:05
Message-ID: CAH2-Wzm22vtxvD-e1oz90DE8Z_M61_8amHsDOZf1PWRKfRmj1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 7, 2017 at 3:23 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> The thing that I'm particularly thinking about is that if someone wants
> an ICU variant collation that we didn't make initdb provide, they'll do
> a CREATE COLLATION and go use it. At update time, pg_dump or pg_upgrade
> will export/import that via CREATE COLLATION, and the only way it fails
> is if ICU rejects the collation name as garbage. (Which, as we already
> established upthread, it's quite unlikely to do.)

Actually, it's *impossible* for ICU to fail to accept any string as a
valid locale within CREATE COLLATION, because CollationCreate() simply
doesn't sanitize ICU names. It doesn't do something like call
get_icu_language_tag(), unlike initdb (within
pg_import_system_collations()).

If I add such a test to CollationCreate(), it does a reasonable job of
sanitizing, while preserving the spirit of the BCP 47 language tag
format by not assuming that the user didn't specify a brand new locale
that it hasn't heard of. All of these are accepted with unmodified
master:

postgres=# CREATE COLLATION test1 (provider = icu, locale = 'en-x-icu');
CREATE COLLATION
postgres=# CREATE COLLATION test2 (provider = icu, locale = 'foo bar baz');
ERROR: XX000: could not convert locale name "foo bar baz" to language
tag: U_ILLEGAL_ARGUMENT_ERROR
LOCATION: get_icu_language_tag, collationcmds.c:454
postgres=# CREATE COLLATION test3 (provider = icu, locale = 'en-gb-icu');
ERROR: XX000: could not convert locale name "en-gb-icu" to language
tag: U_ILLEGAL_ARGUMENT_ERROR
LOCATION: get_icu_language_tag, collationcmds.c:454
postgres=# CREATE COLLATION test4 (provider = icu, locale = 'not-a-country');
CREATE COLLATION

If it's mandatory for get_icu_language_tag() to not throw an error
during initdb import when passed strings like these (that are
generated mechanically), why should we not do the same with CREATE
COLLATION? While the choice to preserve BCP 47's tolerance of missing
collations is debatable, not doing at least this much up-front is a
bug IMV.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2017-08-08 01:04:44 Re: [TRAP: FailedAssertion] causing server to crash
Previous Message Masahiko Sawada 2017-08-08 00:57:41 Re: Subscription code improvements