Re: initdb initalization failure for collation "ja_JP"

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Marco Atzeri <marco(dot)atzeri(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: initdb initalization failure for collation "ja_JP"
Date: 2017-06-22 23:00:02
Message-ID: 28195.1498172402@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Now the hard part of that is that because pg_import_system_collations
> isn't using a temporary staging table, but is just inserting directly
> into pg_collation, there isn't any way for it to eliminate duplicates
> unless it uses if_not_exists behavior all the time. So there seem to
> be two ways to proceed:
> 1. Drop pg_import_system_collations' if_not_exists argument and just
> define it as adding any collations not already known in pg_collation.
> 2. Significantly rewrite it so that it de-dups the collation set by
> hand before trying to insert into pg_collation.

After further thought, I realized that there's another argument for
making pg_import_system_collations() always do if-not-exists, which
is that as it stands it's inconsistent anyway because it silently
uses if-not-exists behavior for aliases.

So attached is a draft patch that drops if_not_exists and tweaks the
alias insertion code to guarantee deterministic behavior. I also
corrected failure to use CommandCounterIncrement() in the ICU part
of the code, which would cause problems if ICU can ever report
duplicate names; something I don't especially care to assume is
impossible. Also, I fixed things so that pg_import_system_collations()
doesn't emit any chatter about duplicate existing names, because it
didn't take long to realize that that behavior was useless and
irritating. (Perhaps we should change the function to return the
number of entries successfully added? But I didn't do that here.)

I've tested this with a faked version of "locale -a" that generates
duplicated entries, and it does what I think it should.

One question that I've got is why the ICU portion refuses to load
any entries unless is_encoding_supported_by_icu(GetDatabaseEncoding()).
Surely this is completely wrong? I should think that what we load into
pg_collation ought to be independent of template1's encoding, the same
as it is for libc collations, and the right place to be making a test
like that is where somebody attempts to use an ICU collation. But
I've not tried to test it.

Also, I'm a bit tempted to remove setup_collation()'s manual insertion of
'ucs_basic' in favor of adding a DATA() line for that to pg_collation.h.
As things stand, if for some reason "locale -a" were to report a locale
named that, initdb would fail. In the old code, it would not have failed
--- it's uncertain whether you would have gotten the forced UTF8/C
definition or libc's definition, but you would have gotten only one.
However, that approach would result in 'ucs_basic' being treated as
pinned, which perhaps we don't want. If we don't want it to be pinned,
another idea is just to make setup_collation() insert it first not last,
thereby ensuring that the SQL definition wins over any libc entries.

Comments?

regards, tom lane

Attachment Content-Type Size
pg_import_system_collations-fixes.patch text/x-diff 20.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2017-06-22 23:12:58 Re: shift_sjis_2004 related autority files are remaining
Previous Message Alvaro Herrera 2017-06-22 22:02:59 Re: Autovacuum launcher occurs error when cancelled by SIGINT