Re: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?
Date: 2017-09-25 21:49:37
Message-ID: CAH2-Wzmx6YFHXyjUG7Bo+h6b0FCR-oZKvbB7OB=WR-BCrycHDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 25, 2017 at 12:52 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> That must have been the real reason why you canonicalized
> pg_collation.collname (I doubt it had anything to do with how keyword
> variants used to be created during initdb, as you suggested). As Tom
> pointed out recently, we've actually always canonicalized collation
> name for libc.

On further examination, none of this really matters, because you
simply cannot store ICU locale names like "en_US" within pg_collation;
it's impossible to do that without breaking many things that have
worked for a long time. initdb already canonicalizes the available
libc collations to produce collations whose names have exactly the
same "en_US" format. There will typically be both "en_US" and
"en_US.utf8" entries within pg_collation with Glibc on Linux, for example
(the former is created a convenient alias for the latter when the
database encoding is UTF-8).

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Rady, Doug 2017-09-25 22:29:26 PATCH: pgbench - option to build using ppoll() for larger connection counts
Previous Message Thomas Munro 2017-09-25 21:34:50 Re: Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().