Re: initdb initalization failure for collation "ja_JP"

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Marco Atzeri <marco(dot)atzeri(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: initdb initalization failure for collation "ja_JP"
Date: 2017-06-20 15:37:02
Message-ID: 9154.1497973022@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Marco Atzeri <marco(dot)atzeri(at)gmail(dot)com> writes:
>> Building on Cygwin latest 10 beta1 or head sourece,
>> make check fails as:
>> ...
>> performing post-bootstrap initialization ... 2017-05-31 23:23:22.214
>> CEST [16860] FATAL: collation "ja_JP" for encoding "EUC_JP" already exists

> Hmph. Could we see the results of "locale -a | grep ja_JP" ?

Despite the lack of followup from the OP, I'm pretty troubled by this
report. It shows that the reimplementation of OS collation data import
as pg_import_system_collations() is a whole lot more fragile than the
original coding. We have never before trusted "locale -a" to not produce
duplicate outputs, not since the very beginning in 414c5a2e. AFAICS,
the current coding has also lost the protections we added very shortly
after that in 853c1750f; and it has also lost the admittedly rather
arbitrary, but at least deterministic, preference order for conflicting
short aliases that was in the original initdb code.

I suppose the idea was to see whether we actually needed those defenses,
but since we have here a failure report after less than a month of beta,
it seems clear to me that we do. I think we need to upgrade
pg_import_system_collations to have all the same logic that was there
before.

Now the hard part of that is that because pg_import_system_collations
isn't using a temporary staging table, but is just inserting directly
into pg_collation, there isn't any way for it to eliminate duplicates
unless it uses if_not_exists behavior all the time. So there seem to
be two ways to proceed:

1. Drop pg_import_system_collations' if_not_exists argument and just
define it as adding any collations not already known in pg_collation.

2. Significantly rewrite it so that it de-dups the collation set by
hand before trying to insert into pg_collation.

#2 seems like a lot more work, but on the other hand, we might need
most of that logic anyway to get back deterministic alias handling.
However, since I cannot see any real-world use case at all for
if_not_exists = false, I figure we might as well do #1 and take
whatever simplification we can get that way.

I'm willing to do the legwork on this, but before I start, does
anyone have any ideas or objections?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sergey Burladyan 2017-06-20 15:42:58 Re: Broken hint bits (freeze)
Previous Message Bruce Momjian 2017-06-20 14:36:43 Re: Broken hint bits (freeze)