Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Verite <daniel(at)manitou-mail(dot)org>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values
Date: 2017-08-15 03:55:47
Message-ID: CAH2-WzmeMSgjO-TocH7+v7vuwzv8C77eqGcTCTGSRattOK2ZLw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Mon, Aug 14, 2017 at 4:11 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Peter Geoghegan <pg(at)bowt(dot)ie> writes:
>> Do we really need to pass "if_not_exists = true", anyway? Why
>> shouldn't initdb fail if there are apparent duplicate ICU collations?
>
> Because the villagers will come after you; see commit
> 0b13b2a7712b6f91590b7589a314240a14520c2f.

Fair enough. Changing it is actually unnecessary anyway.
pg_import_system_collations() should simply use uloc_countAvailable()
+ uloc_getAvailable, rather than ucol_countAvailable() +
ucol_getAvailable(). Just like my test program.

FYI, without the keyword variants, but with all base locales, ICU 55
adds 684 base locales to pg_collation. Should I write a real patch to
make the changes I deem necessary, or do you prefer to Peter E.?

The comments that pg_import_system_collations() adds to each locale
have an annoying tendency to exclude anything that isn't ascii-safe,
which is much more noticeable when locales populate pg_collation
instead of collations:

...
foo │ fr-BE-x-icu │ fr-BE │ fr-BE
│ icu │ French (Belgium)
foo │ fr-BF-x-icu │ fr-BF │ fr-BF
│ icu │ French (Burkina Faso)
foo │ fr-BI-x-icu │ fr-BI │ fr-BI
│ icu │ French (Burundi)
foo │ fr-BJ-x-icu │ fr-BJ │ fr-BJ
│ icu │ French (Benin)
foo │ fr-BL-x-icu │ fr-BL │ fr-BL
│ icu │
foo │ fr-CA-x-icu │ fr-CA │ fr-CA
│ icu │ French (Canada)
foo │ fr-CD-x-icu │ fr-CD │ fr-CD
│ icu │ French (Congo - Kinshasa)
foo │ fr-CF-x-icu │ fr-CF │ fr-CF
│ icu │ French (Central African Republic)
foo │ fr-CG-x-icu │ fr-CG │ fr-CG
│ icu │ French (Congo - Brazzaville)
foo │ fr-CH-x-icu │ fr-CH │ fr-CH
│ icu │ French (Switzerland)
foo │ fr-CI-x-icu │ fr-CI │ fr-CI
│ icu │
foo │ fr-CM-x-icu │ fr-CM │ fr-CM
│ icu │ French (Cameroon)
foo │ fr-DJ-x-icu │ fr-DJ │ fr-DJ
│ icu │ French (Djibouti)
...

Notice the lack of descriptions here, just because they weren't
ascii-safe. I now wonder if we should even keep the comment thing at
all. We can eventually add an SQL function, that is called by \dOS+,
that interrogates ICU for a description on the fly. No need to store
anything.

Another thing that I dislike about get_icu_locale_comment() is that
the descriptions we show are always English display names, regardless
of pg_database collation or anything else. We shouldn't hard-code the
use of English for display name, on general principle.

> It's possible that there's an argument for not de-duplicating ICU
> collation names even though we must do so for libc collation names.
> But I'm not exactly sure what that would be. Having initdb fail if
> the ICU configuration is funny isn't going to endear us to anyone.

BTW, get_icu_language_tag() has a buglet. ULOC_FULLNAME_CAPACITY does
not bound the size of a language tag; those can be almost arbitrarily
long, unlike locale identifiers. That should probably be fixed, if
only to avoid setting a bad example.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2017-08-15 04:23:02 Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values
Previous Message Noah Misch 2017-08-15 02:52:43 Re: [postgresql 10 beta3] unrecognized node type: 90

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-08-15 04:23:02 Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values
Previous Message Alvaro Herrera 2017-08-15 03:45:48 Re: Explicit relation name in VACUUM VERBOSE log