Re: [HACKERS] Can ICU be used for a database's default sort order?

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Vladimir Borodin <root(at)simply(dot)name>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Marina Polyakova <m(dot)polyakova(at)postgrespro(dot)ru>
Subject: Re: [HACKERS] Can ICU be used for a database's default sort order?
Date: 2018-02-16 08:12:39
Message-ID: 92826DEB-DA8F-4AE4-9C43-03A55D18A766@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi everyone!

> 10 февр. 2018 г., в 20:45, Andrey Borodin <x4mmm(at)yandex-team(dot)ru> написал(а):
>
> I'm planning to provide review
>

So, I was looking into the patch.
The patch adds:
1. Ability to specify collation provider (with version) in --locale for initdb and createdb.
2. Changes to locale checks
3. Sets ICU as default collation provider. For example "ru_RU(at)icu(dot)153(dot)80(dot)32(dot)1" is default on my machine with patch
4. Tests and necessary changes to documentation

With patch I get correct ICU ordering by default
postgres=# select unnest(array['е','ё','ж']) order by 1;
unnest
--------
е
ё
ж
(3 rows)

While libc locale provides incorrect order (I also get same ordering by default without patch)

postgres=# select c from unnest(array['е','ё','ж']) c order by c collate "ru_RU";
c
---
е
ж
ё
(3 rows)

Unfortunately, neither "ru_RU(at)icu(dot)153(dot)80(dot)32(dot)1" (exposed by LC_COLLATE and other places) nor "ru_RU(at)icu" cannot be used by collate SQL clause.
Also, patch removes compatibility with MSVC 1800 (Visual Studio 2013) on Windows XP and Windows Server 2003. This is done to use newer locale-related functions in VS2013 build.

If the database was initialized with default locale without this patch, one cannot connect to it anymore
psql: FATAL: could not find out the collation provider for datcollate "ru_RU.UTF-8" of database "postgres"
This problem is mentioned in commit message of the patch. I think that this problem should be addressed somehow.
What do you think?

Overall patch looks solid and thoughtful work and adds important functionality.

Best regards, Andrey Borodin.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tushar 2018-02-16 08:14:05 Server crash in pg_replication_slot_advance function
Previous Message Michael Paquier 2018-02-16 08:06:20 Re: [bug fix] Cascaded standby cannot start after a clean shutdown