Re: initdb initalization failure for collation "ja_JP"

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Marco Atzeri <marco(dot)atzeri(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: initdb initalization failure for collation "ja_JP"
Date: 2017-06-23 20:38:42
Message-ID: 8599.1498250322@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> One question that I've got is why the ICU portion refuses to load
> any entries unless is_encoding_supported_by_icu(GetDatabaseEncoding()).
> Surely this is completely wrong? I should think that what we load into
> pg_collation ought to be independent of template1's encoding, the same
> as it is for libc collations, and the right place to be making a test
> like that is where somebody attempts to use an ICU collation. But
> I've not tried to test it.

So I did test that, and found out the presumable reason why that's there:
icu_from_uchar() falls over if the database encoding is unsupported, and
we use that to convert ICU "display names" for use as comments for the
ICU collations. But that's not very much less wrongheaded, because it will
allow non-ASCII characters into the initial database contents, which is
absolutely not acceptable. We assume we can bit-copy the contents of
template0 and it will be valid in any encoding.

Therefore, I think the right thing to do is remove that test and change
get_icu_locale_comment() so that it rejects non-ASCII text, making the
encoding conversion trivial, as in the attached patch.

On my Fedora 25 laptop, the only collations that go without a comment
in this approach are the "nb" ones (Norwegian Bokmål). As I recall,
that locale is a second-class citizen for other reasons already,
precisely because of its loony insistence on a non-ASCII name even
when we're asking for an Anglicized version.

I'm inclined to add a test to reject non-ASCII in the ICU locale names as
well as the comments. We've had to do that for libc locale names, and
this experience shows that the ICU locale maintainers don't have their
heads screwed on any straighter. But this patch doesn't do that.

regards, tom lane

Attachment Content-Type Size
fix-ICU-collation-setup.patch text/x-diff 3.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-06-23 20:50:57 Re: Logical replication: stuck spinlock at ReplicationSlotRelease
Previous Message Andres Freund 2017-06-23 20:15:09 Re: Logical replication: stuck spinlock at ReplicationSlotRelease