Re: ICU for global collation

From: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>
Subject: Re: ICU for global collation
Date: 2021-12-30 12:07:21
Message-ID: 525ef44f-52bf-505f-a491-07835d039424@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


There were a few inquiries about this topic recently, so I dug up the
old thread and patch. What we got stuck on last time was that we can't
just swap out all locale support in a database for ICU. We still need
to set the usual locale environment, otherwise some things that are not
ICU aware will break or degrade. I had initially anticipated fixing
that by converting everything that uses libc locales to ICU. But that
turned out to be tedious and ultimately not very useful as far as the
user-facing result is concerned, so I gave up.

So this is a different approach: If you choose ICU as the default locale
for a database, you still need to specify lc_ctype and lc_collate
settings, as before. Unlike in the previous patch, where the ICU
collation name was written in datcollate, there is now a third column
(daticucoll), so we can store all three values. This fixes the
described problem. Other than that, once you get all the initial
settings right, it basically just works: The places that have ICU
support now will use a database-wide ICU collation if appropriate, the
places that don't have ICU support continue to use the global libc
locale settings.

I changed the datcollate, datctype, and the new daticucoll fields to
type text (from name). That way, the daticucoll field can be set to
null if it's not applicable. Also, the limit of 63 characters can
actually be a problem if you want to use some combination of the options
that ICU locales offer. And for less extreme uses, having
variable-length fields will save some storage, since typical locale
names are much shorter.

For the same reasons and to keep things consistent, I also changed the
analogous pg_collation fields like that. This also removes some weird
code that has to check that colcollate and colctype have to be the same
for ICU, so it's overall cleaner.

Attachment Content-Type Size
v3-0001-Add-option-to-use-ICU-as-global-collation-provide.patch text/plain 69.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Maxim Orlov 2021-12-30 12:15:16 Add 64-bit XIDs into PostgreSQL 15
Previous Message Maxim Orlov 2021-12-30 11:51:10 Re: Pre-allocating WAL files