Re: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?
Date: 2017-09-25 18:40:41
Message-ID: f6c0fca7-e277-3f46-c0c1-adc001bffdd7@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9/22/17 16:46, Peter Geoghegan wrote:
> But you are *already* canonicalizing ICU collation names as BCP 47. My
> point here is: Why not finish the job off, and *also* canonicalize
> colcollate in the same way? This won't break ucol_open() if we take
> appropriate precautions when we go to use the Postgres collation/ICU
> locale.

Reading over this code again, it is admittedly not quite clear why this
"canonicalization" is in there right now. I think it had something to
do with how we built the keyword variants at one point. It might not
make sense. I'd be glad to take that out and use the result straight
from uloc_getAvailable() for collcollate. That is, after all, the
"canonical" version that ICU chooses to report to us.

> One concern that makes me suggest this is: What happens when
> the user *downgrades* ICU version, from a version where colcollate is
> BCP 47 to one where it would not have been at initdb time? That will
> break the downgrade in an unpleasant way, including in installations
> that never do a CREATE COLLATION themselves. We want to be able to
> restore a basebackup on a somewhat different OS, and have that still
> work following REINDEX. At least, that seems like it should be an
> important goal for us.

This is an interesting point, and my proposal above would fix that.
However, I think that taking a PostgreSQL data directory and moving or
copying it to an *older* OS installation is always going to have a
potential for problems. So I wouldn't spend a huge amount of effort
just to fix this specific case.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-09-25 18:42:22 Re: Patch to address concerns about ICU collcollate stability in v10 (Was: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?)
Previous Message Andres Freund 2017-09-25 18:38:52 Re: Built-in plugin for logical decoding output