Re: Collation version tracking for macOS

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, "Nasby, Jim" <nasbyj(at)amazon(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation version tracking for macOS
Date: 2022-06-11 23:59:44
Message-ID: CA+hUKGLQD8y+S5rUTKm6wS_CZ++TN3bM3cr9UoxkhacBmmj1Dw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jun 11, 2022 at 4:21 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> What about "time travel collations", but without the time travel part?
> That is, what about supporting multiple ICU versions per cluster, but
> not per database? So you could upgrade the OS and Postgres, using
> standard packages that typically just use the latest ICU version --
> typically, but not always. If you happen to have been on an older
> version of ICU on upgrade, then that version of ICU will still work at
> the level of a whole database -- your database. Maybe you can create
> new databases with old and new ICU versions if you want to.
>
> That obviously runs into the problem of needing to eventually do a
> dump and reload -- but I suppose that "eventually" could be a very
> long time. At least the OS package doesn't declare one version of ICU
> the blessed version, now and forever, effectively vendoring ICU in a
> backdoor fashion. At least old databases have significant runway,
> while at the same time new databases that want to use the same
> standard Postgres package aren't forced to use the same ancient ICU
> version.

Hmm. I think that's effectively what you'd get using my "distinct
collation" patch (v1, or this much better v3, attached), if you put
version prefixes in colliculocale, and updated them in the template
database after an OS upgrade to affect new databases. I realise you
probably mean something a little more automatic...

I think "pinned forever" ICU versions would be useful, because I think
there are very few expert users who want fine generalised control over
version changes, but almost all other users don't care at all about
any of this stuff -- as long as their indexes keep indexing and their
constraints keep constraining. So I think you can make a lot of
people happy by ignoring the complexities of upgrades and providing a
way to nail the version down for the lifetime of the database. Also,
it's not *impossible* to move to a later ICU, it's just a bit tricky;
the key point is that it's under your control if you want to do that,
independently of an OS upgrade, as you said.

Based on my reading of that DB2 manual page, I reckon my v3 "distinct
collation" patch is about as good as what they have. If you don't
choose to use prefixes then later OS upgrades (ie upgrades that change
the version of ICU that PostgreSQL is linked against) might corrupt
your indexes and constraints -- I think that's what they're saying --
though at least we'll try to warn about that with our weak warning
system. If you do choose to use prefixes you'll be stuck on that ICU
version forever, even across updates that cause PostgreSQL to be
linked to future releases of ICU, unless you're prepared to do a whole
bunch of careful upgrading work (perhaps with some future tooling to
help with that).

Attached is a much more fleshed out version of the "distinct
collation" patch (to be clear: v3 descends from v1, while v2 was at
attempt at the timelord approach). Main changes:

1. I now also route strTo{Upper,Lower,Title} to the right version of
ICU. That requires dlopen'ing a second library.

2. You create distinct collations with optional ICU major version
prefixes, along the lines of what Jim was suggesting. For example
(and I'm not actually proposing -x-icu67 suffixes, just avoiding a
collision in my example):

postgres=# create collation "en-x-icu67" (provider = icu , locale = '67:en');
CREATE COLLATION
postgres=# create collation "en-x-icu63" (provider = icu , locale = '63:en');
CREATE COLLATION

3. For versions other than the one we are linked against, it tries to
open libraries with typical names inside $libdir. An administrator
could drop symlinks in there like so:

$ ln -s /usr/lib/x86_64-linux-gnu/libicui18n.so.63 ~/install/lib/postgresql/
$ ln -s /usr/lib/x86_64-linux-gnu/libicuuc.so.63 ~/install/lib/postgresql/

What I like about this $libdir scheme is that I imagine that we could
ask our friends in the packaging teams to create packages for that.
Users would then think of them in much the same way as extensions.
You'd just type:

$ sudo apt-get install postgresql-16-icu71

Happy to keep trying to figure out the competing and rather more
ambitious TT version too (that I sketched some bits of in v2), but I'm
a whole lot fuzzier on how that can work and kinda stuck on the
problems I raised.

Attachment Content-Type Size
v3-0001-WIP-multi-ICU-distinct-collations.patch text/x-patch 26.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2022-06-12 03:18:15 Re: Collation version tracking for macOS
Previous Message Matthias van de Meent 2022-06-11 19:25:48 Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths