Re: Collation version tracking for macOS

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: Jeremy Schneider <schneider(at)ardentperf(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, "Nasby, Jim" <nasbyj(at)amazon(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation version tracking for macOS
Date: 2022-10-21 21:24:06
Message-ID: CA+hUKGL36vXMfcaDq+U1ZkoSsdfFnNx7GxhGM7aYzEbKs1W0=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Here is a rebase of this experimental patch. I think the basic
mechanics are promising, but we haven't agreed on a UX. I hope we can
figure this out.

Restating the choice made in this branch of the experiment: Here I
try to be just like DB2 (if I understood its manual correctly).
In DB2, you can use names like "en_US" if you don't care about
changes, and names like "CLDR181_en_US" if you do. It's the user's
choice to use the second kind to avoid "unexpected effects on
applications or database objects" after upgrades. Translated to
PostgreSQL concepts, you can use a database default ICU locale like
"en-US" if you don't care and "67:en-US" if you do, and for COLLATION
objects it's the same. The convention I tried in this patch is that
you use either "en-US-x-icu" (which points to "en-US") or
"en-US-x-icu67" (which points to "67:en-US") depending on whether you
care about this problem.

I recognise that this is a bit cheesy, it's all the user's problem to
deal with or ignore.

An alternative mentioned by Peter E was that the locale names
shouldn't carry the prefix, but somehow we should have a list of ICU
versions to search for a matching datcollversion/collversion. How
would that look? Perhaps a GUC, icu_library_versions = '63, 67, 71'?
There is a currently natural and smallish range of supported versions,
probably something like 54 ... U_ICU_VERSION_MAJOR_NUM, but it seems a
bit weird to try to dlopen ~25 libraries or whatever it might be...
Do you think we should try to code this up?

I haven't tried it, but the main usability problem I predict with that
idea is this: It can cope with a scenario where you created a
database with ICU 63 and started using a default of "en" and maybe
some explicit fr-x-icu or whatever, and then you upgrade to a new
postgres binary using ICU 71, and, as long as you still have ICU 63
installed it'll just magicaly keep using 63, now via dlopen(). But it
doesn't provide a way for me to create a new database that uses 63 on
purpose when I know what I'm doing. There are various reasons I might
want to do that.

Maybe the ideas could be combined? Perhaps "en" means "create using
binary's linked ICU, open using search-by-collversion", while "67:en"
explicitly says which to use?

Changes since last version:

* Now it just uses the default dlopen() search path, unless you set
icu_library_path. Is that a security problem? It's pretty
convenient, because it means you can just "apt-get install libicu63"
(or local equivalent) and that's all, now 63 is available.

* To try the idea out, I made it automatically create "*-x-icu67"
alongside the regular "-x-icu" collation objects at initdb time.

Attachment Content-Type Size
v5-0001-WIP-Multi-version-ICU.patch application/x-patch 30.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikita Malakhov 2022-10-21 22:36:31 Re: Pluggable toaster
Previous Message Peter Eisentraut 2022-10-21 19:17:47 Re: refactor ownercheck and aclcheck functions