Re: ICU integration

From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: Andreas Karlsson <andreas(at)proxel(dot)se>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ICU integration
Date: 2017-03-20 03:06:46
Message-ID: 272a46ca-4596-563d-1d9c-7d08a0d35006@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/19/17 18:31, Andreas Karlsson wrote:
> Ok, I applied to patch again and ran the tests. I get a test failure on
> make check-world in the pg_dump tests but it can be fixed with the below.

Fixed. This was a new pg_dump test that had been added in the meantime.

> I believe you are mistaken. The locale "sv" is just an alias for
> "sv-u-standard" as far as I can tell. See the definition of the Swedish
> locale
> (http://source.icu-project.org/repos/icu/trunk/icu4c/source/data/coll/sv.txt)
> and there are just three collations: reformed (default), standard,
> search (plus eot and emoji which are inherited).

I have filed bug <https://ssl.icu-project.org/trac/ticket/13050> about
this. I don't think we can work around this by simply ignoring the
"standard" variant. For example, the URL you show indicates that the
default variant for "sv" is "reformed", not "standard".

> I am also quite annoyed at col-emoji and col-eor (European Ordering
> Rules). They are defined at the root and inherited by all languages, but
> no language modifies col-emoji for their needs which makes it a foot
> gun. See the Danish sorting example below where at least I expected the
> same order. For col-eor it makes a bit more sense since I would expect
> the locales en_GB-u-col-eot and sv_SE-u-col-eot to collate exactly the same.

I think this is just the way the inheritance of locales works there. If
the root has a variant and the more specific locale doesn't, then if you
specify the variant, it gets pulled from the root. It is a bit dubious.
Arguably, either the customizations of the specific locale should be
applied to all variants, or ucol_getKeywordValuesForLocale() shouldn't
return them, or both. Perhaps another bug report.

>>> - I get an error when creating a new locale.
>>>
>>> #CREATE COLLATION sv2 (LOCALE = 'sv');
>>> ERROR: could not create locale "sv": Success
>>>
>>> # CREATE COLLATION sv2 (LOCALE = 'sv');
>>> ERROR: could not create locale "sv": Resource temporarily unavailable
>>> Time: 1.109 ms
>>
>> Hmm, that's pretty straightforward code. What is your operating system?
>> What are the build options? Does it work without this patch?
>
> This issue is unrelated to ICU. I had forgot to specify provider so the
> eorrs are correct (even though that the value of the errno is weird).

Perhaps we can do something about this anyway. What is our OS and version?

>>> - I do not know what the policy for formatting the documentation is, but
>>> some of the paragraphs are in need of re-wrapping.
>>
>> Hmm, I don't see anything terribly bad.
>
> Maybe it is just me who is sensitive to wrapping, but her is an example
> which sticks out to me with its 98 character line.

I see. Fixed some of those.

>>> - The patch adds "FIXME" in the below. Is that a left-over from
>>> development or something which still needs doing?
>>>
>>> /*
>>> * Also forbid matching an any-encoding entry. This test of course
>>> is not
>>> * backed up by the unique index, but it's not a problem since we don't
>>> - * support adding any-encoding entries after initdb.
>>> + * support adding any-encoding entries after initdb. FIXME
>>> */
>>
>> I had mentioned that upthread. It technically needs "doing" as you say,
>> but it's not clear how and it's not terribly important, arguably.
>
> The comment is no longer true since for ICU we can do that (it is not an
> issue though). At the very least this comment needs to be updated.

Fixed that by some additional checks and using ShareRowExclusiveLock in
CREATE COLLATION to prevent concurrent changes.

New patch attached.

I have also implemented additional facilities to check collations
needing refresh after an upgrade (see ALTER COLLATION ref page), and
some necessary bits for that in pg_dump. Some additional support in
pg_upgrade could be useful, but I think we don't need that until people
are upgrading *from* PG10.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
v7-0001-ICU-support.patch application/x-patch 166.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2017-03-20 03:25:58 Re: Determine if an error is transient by its error code.
Previous Message Robert Haas 2017-03-20 02:57:42 Re: Speed up Clog Access by increasing CLOG buffers