From: | Oleg Tselebrovskiy <o(dot)tselebrovskiy(at)postgrespro(dot)ru> |
---|---|
To: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
Cc: | pgsql-docs(at)lists(dot)postgresql(dot)org |
Subject: | Re: Initcap works differently with different locale providers |
Date: | 2025-07-29 04:03:33 |
Message-ID: | 0a54a90a5154281486b1acb07e5650df@postgrespro.ru |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-docs |
Alexander Korotkov wrote at 2025-07-28 17:23:
> On Mon, Jul 28, 2025 at 1:20 PM Alexander Korotkov
> <aekorotkov(at)gmail(dot)com> wrote:
>>
>> On 25 Sep 2024, at 18:13, Oleg Tselebrovskiy
>> <o(dot)tselebrovskiy(at)postgrespro(dot)ru> wrote:
>>
>> Greetings, everyone!
>>
>> One of our clients has found a difference in behaviour of initcap
>> function when
>> using different locale providers, shown below
>>
>> postgres=# create database test_db_1 locale_provider=icu
>> locale="ru_RU.UTF-8" template=template0;
>> NOTICE: using standard form "ru-RU" for ICU locale "ru_RU.UTF-8"
>> CREATE DATABASE
>> postgres=# \c test_db_1;
>> You are now connected to database "test_db_1" as user "postgres".
>> test_db_1=# select initcap('ЧиЮ А.Ю.');
>> initcap
>> ----------
>> Чию А.ю.
>> (1 row)
>> test_db_1=# select initcap('joHn d.e.');
>> initcap
>> -----------
>> John D.e.
>> (1 row)
>> postgres=# create database test_db_2 locale_provider=libc
>> locale="ru_RU.UTF-8" template=template0;
>> CREATE DATABASE
>> postgres=# \c test_db_2
>> You are now connected to database "test_db_2" as user "postgres".
>> test_db_2=# select initcap('ЧиЮ А.Ю.');
>> initcap
>> ----------
>> Чию А.Ю.
>> (1 row)
>> test_db_2=# select initcap('joHn d.e.');
>> initcap
>> -----------
>> John D.E.
>> (1 row)
>>
>> And an easier reproduction (should work for REL_12_STABLE and up)
>>
>> postgres=# SELECT initcap('first.second' COLLATE "en-x-icu");
>> initcap
>> --------------
>> First.second
>> (1 row)
>> postgres=# SELECT initcap('first.second' COLLATE "en_US");
>> initcap
>> --------------
>> First.Second
>> (1 row)
>>
>> This behaviour is reproducible on REL_12_STABLE and up to master
>>
>> I don't believe that this is an erroneous behaviour, just a differing
>> one, hence
>> just a documentation change proposition
>>
>> I suggest adding a clarification that this function works differently
>> with libc
>> and ICU providers because there is a difference in what a "word" is
>> between them
>>
>> In libc a word is a sequence of alphanumeric characters, separated by
>> non-alphanumeric characters (as it is written in documentation right
>> now)
>> In ICU words are divided according to Unicode® Standard Annex #29 [1]
>>
>> Similar issue was briefly discussed in [2]
>>
>> The suggested documentation patch is attached (versions for
>> REL_13_STABLE+ and
>> for REL_12_STABLE only)
>>
>> [1]: https://www.unicode.org/reports/tr29/#Word_Boundaries
>> [2]:
>> https://www.postgresql.org/message-id/CAEwbS1R8pwhRkwRo3XsPt24ErBNtFWuReAZhVPJwA3oqo148tA%40mail.gmail.com
>>
>> Oleg Tselebrovskiy, Postgres
>> Professional<v1-0001-string-functions.patch><v1-0002-string-functions-REL_12.patch>
>>
>>
>> I can confirm inicap works with libc and libicu as you stated. The
>> documentation patch looks good to me. I’ve written a commit message.
>> The REL_12_STABLE branch is not relevant anymore as it’s out of
>> support. I’m going to push this if no objections.
>
> I'm sorry for these many messages. My email client just gone crazy.
> Must be fixed now.
>
> ------
> Regards,
> Alexander Korotkov
> Supabase
Commit message looks good to me, also no objections on ignoring
REL_12_STABLE :)
Thank you!
Regards, Oleg Tselebrovskiy
From | Date | Subject | |
---|---|---|---|
Next Message | PG Doc comments form | 2025-07-30 04:52:16 | further clarification: alter table alter column set not null - table scan is skipped |
Previous Message | Alexander Korotkov | 2025-07-28 10:23:28 | Re: Initcap works differently with different locale providers |