Re: Collation version tracking for macOS

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: "Finnerty, Jim" <jfinnert(at)amazon(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, "Nasby, Jim" <nasbyj(at)amazon(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation version tracking for macOS
Date: 2022-06-09 23:22:37
Message-ID: CA+hUKGJMndLA-uX8N2RmH6AnNB1-W3mPvKE1vuL1BtS9RicVfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 10, 2022 at 9:20 AM Finnerty, Jim <jfinnert(at)amazon(dot)com> wrote:
> Specifying the library name before the language-country code with a new separator (":") as you suggested below has some benefits.

One of the reasons for putting some representation of desired library
into the colliculocale column (rather than, say, adding a new column
pg_collation) is that I think we'd also want to be able to put that
into daticulocale (for the database default collation, when using
ICU). But really I just did that because it was easy... perhaps, both
pg_collation and pg_database could gain a new column, and that would
be a little more pleasing from a schema design point of view (1NF
atomicity, and it's a sort of foreign key, or at least it would be if
there were another catalog to list library versions...)?

> Did you consider making the collation version just another collation attribute, such as colStrength, colCaseLevel, etc.?
> For example, an alternate syntax might be:
>
> create collation icu63."en-US-x-icu" (provider = icu, locale = 'en-US(at)colVersion=63');

Hmm, I hadn't considered that. (I wouldn't call it "col" version BTW,
it's a library version, and we don't want to overload our terminology
for collation version. We'd still be on the look out for collversion
changes coming from a single library's minor version changing, for
example an apt-get upgrade can replace the .63 files, which on most
systems are symlinks to .63.1, .63.2 etc. ☠️)

> Was the concern that ICU might redefine a new collation property with the same name in a different and incompatible way (we might work with the ICU developers to agree on what it should be), or that a version is just not the same kind of collation property as the other collation properties?

Well my first impression is that we don't really own that namespace,
and since we're using this to decide which library to route calls to,
it seems nicer to put it at a "higher level" than those properties.
So I'd prefer something like "63:en-US", or 63 in a new column.

> (in the example above, I'm assuming that for provider = icu, we could translate '63' into 'libicui18n.so.63' automatically.)

Yeah. My patch that jams a library name in there was just the fastest
way I could think of to get something off the ground to test whether I
could route calls to different libraries (yes!), though at one moment
I thought it wasn't terrible. But aside from any aesthetic complaints
about that way of doing it, it turns out not to be enough: we need to
dlopen() two different libraries, because we also need some ctype-ish
functions from this guy:

$ nm -D -C /usr/lib/x86_64-linux-gnu/libicuuc.so.63.1 | grep u_strToUpper
00000000000d22c0 T u_strToUpper_63

I guess we probably want to just put "63" somewhere in pg_collation,
as you say. But then, teaching PostgreSQL how to expand that to a
name that is platform/packaging dependent seems bad. The variations
would probably be minor; on a Mac it's .dylib, on AIX it may be .a,
and the .63 convention may not be universal, I dunno, but some systems
might need absolute paths (depending on ld.so.conf etc), but that's
all stuff that I think an administrator should care about, not us.

Perhaps there could be a new catalog table just for that. So far I
have imagined there would still be one special ICU library linked at
build time, which doesn't need to be dlopen'd, and works automatically
without administrators having to declare it. So a system that has one
linked-in library version 67, and then has two extras that have been
added by an administrator running some new DDL commands might have:

postgres=# select * from pg_icu_library order by version;
version | libicuuc | libicui18n
---------+----------------+------------------
58 | libicuuc.so.58 | libicui18n.so.58
63 | libicuuc.so.63 | libicui18n.so.63
67 | |
(3 rows)

Suppose you pg_upgrade to something that is linked against 71.
Perhaps you'd need to tell it how to dlopen 67 before you can open any
collations with that library, but once you've done that your
collation-dependent partition constraints etc should all hold. I
dunno, lots of problems to figure out here, including quite broad ones
about various migration problems. I haven't understood what Peter G
is suggesting about how upgrades might work, so I'll go and try to do
that...

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2022-06-09 23:29:38 Re: [PoC] Let libpq reject unexpected authentication requests
Previous Message Dagfinn Ilmari Mannsåker 2022-06-09 22:55:11 Re: Logging query parmeters in auto_explain