Re: ICU_LOCALE set database default icu collation but not working as intended.

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "jian he" <jian(dot)universality(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ICU_LOCALE set database default icu collation but not working as intended.
Date: 2022-05-28 17:18:51
Message-ID: bc2db8c6-dcf8-495b-bdf9-3224a75ddd9a@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

jian he wrote:

> - dbicu3, ICU_LOCALE 'en-u-kr-latn-digit-kf-upper-kn-true' seems
> 'kf-upper' not grouped strings beginning with character 'A' together?

You seem to expect that the sort algorithm takes characters
from left to right, and when it compares 'A' and 'a', it will
sort the string with the 'A' before, no matter what other
characters are in the rest of the string.

I don't think that's what kf-upper does. I think kf-upper kicks in
only for strings that are identical at the secondary level.
In your example, its effect is to make 'A 70' sort before
'a 70' . The other strings are unaffected.

> - dbicu4, ICU_LOCALE 'en-u-kr-latn-digit-kn-true' since upper/lower not
> explicitly mentioned, and since the collation is deterministic, so
> character 'A' should be grouped together first then do the numeric value

The deterministic property is only relevant when strings are compared equal
by ICU. Since your collations use the default strength setting (tertiary)
and the strings in your example are all different at this level,
the fact that the collation is deterministic does not play a role in
the results.

Besides, the TR35 doc says for "kn" (numeric ordering)

"If set to on, any sequence of Decimal Digits (General_Category = Nd in
the [UAX44]) is sorted at a primary level with its numeric value"

which means that the order of numbers (7, 11, 19, 70, 117) is "stronger"
(primary level) than the relative order of the 'a' and 'A'
(case difference=secondary level) that precede them.
That's why these numbers drive the sort for these strings that are
otherwise identical at the primary level.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Shay Rojansky 2022-05-28 18:16:40 CREATE COLLATION must be specified
Previous Message Feike Steenbergen 2022-05-28 17:12:38 Re: postgres and initdb not working inside docker