Re: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?
Date: 2017-09-20 18:23:58
Message-ID: CAH2-Wz=ets89wn434EAG4NJB0SJb=irmXQAVm-BPhrYphcHXqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 19, 2017 at 7:01 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I didn't post the patch that generates the errors in my opening e-mail
> because I'm not confident it's correct just yet. And, I think that I
> see a bigger problem: we pass a string that is almost certainly a BCP
> 47 string to ucol_open() from within pg_newlocale_from_collation(). We
> do so despite the fact that ucol_open() apparently doesn't accept BCP
> 47 syntax locale strings until ICU 54 [1]. Seems entirely possible
> that this accounts for the problems you saw on ICU 4.2, back when we
> were still creating keyword variants (I guess that the keyword
> variants seem very "BCP 47-ish" to me).

ISTM that the proper fix here is to use uloc_forLanguageTag() [1] (not
to be confused with uloc_toLanguageTag()) to get a valid locale
identifier on versions of ICU where BCP 47 format tags are not
directly accepted as locale identifiers (versions prior to ICU 54).
This would happen as an extra step within
pg_newlocale_from_collation(), since BCP 47 format would be what is
stored in pg_collation.

Since uloc_forLanguageTag() become stable in ICU 4.2, the earliest
version that we support, I believe that that would leave us in good
shape.

[1] https://ssl.icu-project.org/apiref/icu4c/uloc_8h.html#aa45d6457f72867880f079e27a63c6fcb
--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Darafei Praliaskouski 2017-09-20 19:00:59 Re: compress method for spgist - 2
Previous Message Robert Haas 2017-09-20 18:15:51 Re: POC: Cache data in GetSnapshotData()