ICU for global collation

From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: ICU for global collation
Date: 2019-08-20 14:21:21
Message-ID: 5e756dd6-0e91-d778-96fd-b1bcb06c161a@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here is an initial patch to add the option to use ICU as the global
collation provider, a long-requested feature.

To activate, use something like

initdb --collation-provider=icu --locale=...

A trick here is that since we need to also still set the normal POSIX
locales, the --locale value needs to be valid as both a POSIX locale and
a ICU locale. If that doesn't work out, there is also a way to specify
it separately, e.g.,

initdb --collation-provider=icu --locale=en_US.utf8 --icu-locale=en

This complexity is unfortunate, but I don't see a way around it right now.

There are also options for createdb and CREATE DATABASE to do this for a
particular database only.

Besides this, the implementation is quite small: When starting up a
database, we create an ICU collator object, store it in a global
variable, and then use it when appropriate. All the ICU code for
creating and invoking those collators already exists of course.

For the version tracking, I use the pg_collation row for the "default"
collation. Again, this mostly reuses existing code and concepts.

Nondeterministic collations are not supported for the global collation,
because then LIKE and regular expressions don't work and that breaks
some system views. This needs some separate research.

To test, run the existing regression tests against a database
initialized with ICU. Perhaps some options for pg_regress could
facilitate that.

I fear that the Localization chapter in the documentation will need a
bit of a rewrite after this, because the hitherto separately treated
concepts of locale and collation are fusing together. I haven't done
that here yet, but that would be the plan for later.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
v1-0001-Add-option-to-use-ICU-as-global-collation-provide.patch text/plain 43.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-08-20 14:27:44 Re: POC: Cleaning up orphaned files using undo logs
Previous Message Tom Lane 2019-08-20 14:07:02 Re: configure still looking for crypt()?