Re: Collation version tracking for macOS

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, "Nasby, Jim" <nasbyj(at)amazon(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation version tracking for macOS
Date: 2022-06-11 01:48:19
Message-ID: CA+hUKGLBKPt0q76G3Jpc=DG4H30-j2voLv4wLeaRDOKjFmo1Zg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 10, 2022 at 4:30 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> I'm not sold on any particular plan, but working through some examples
> helped me see your idea better... I may try to code that up in a
> minimal way so we can kick the tyres...

I did a bit of hacking on that idea. The goal was to stamp each index
with an ICU major version (not sure where, not done in the attached),
and if that doesn't match the library we're linked against, we'd try
to dlopen() libraries via symlinks with known name formats under
PGDATA/pg_icu_lib, which an administrator would have to create. That
seemed a bit simpler than dealing with new catalogs for now...

See attached unfinished patch, which implements some of that. It has
a single collation for en-US-x-icu, and routes calls to different
libraries depending on dynamic scope (which in cold hard reality
translates into a nasty global variable "current_icu_library"). The
idea was that it would normally point to the library we're linked
against, but whenever computing anything related to an index stamped
with ICU 63, we'd do pg_icu_activate_major_version(63), and afterwards
undo that. Performance concerns aside, that now seems a bit too ugly
and fragile to me, and I gave up. How could we convince ourselves
that we'd set the active ICU library correctly in all the required
dynamic scopes, but not leaked it into any other scopes? Does that
even make sense? But if not done like that, how else could we do it?

Better ideas/code welcome.

Executive summary of experiments so far: the "distinct collations"
concept is quite simple and robust, but exposes all the versions to
users and probably makes it really hard to upgrade (details not worked
out), while the "time travelling collations" concept is nice for users
but hard to pin down and prove correctness for since it seems to
require dynamic scoping/global state changes affecting code in far
away places.

Attachment Content-Type Size
v2-0001-WIP-allow-multiple-ICU-libraries.patch text/x-patch 23.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2022-06-11 02:28:44 Re: Collation version tracking for macOS
Previous Message Amit Kapila 2022-06-11 01:36:22 Re: Replica Identity check of partition table on subscriber