Re: Collation version tracking for macOS

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Nasby, Jim" <nasbyj(at)amazon(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation version tracking for macOS
Date: 2022-11-29 04:34:56
Message-ID: CA+hUKGL5cYbrf3DXYNLBV78UXBiOaP-59MAzKFvC7dfT+49pTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 29, 2022 at 3:55 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> =# select * from pg_icu_collation_versions('en_US') order by
> icu_version;
> icu_version | uca_version | collator_version
> -------------+-------------+------------------
> 50.2 | 6.2 | 58.0.6.50
> 51.3 | 6.2 | 58.0.6.50
> 52.2 | 6.2 | 58.0.6.50
> 53.2 | 6.3 | 137.51
> 54.2 | 7.0 | 137.56
> 55.2 | 7.0 | 153.56
> 56.2 | 8.0 | 153.64
> 57.2 | 8.0 | 153.64
> 58.3 | 9.0 | 153.72
> 59.2 | 9.0 | 153.72
> 60.3 | 10.0 | 153.80
> 61.2 | 10.0 | 153.80
> 62.2 | 11.0 | 153.88
> 63.2 | 11.0 | 153.88
> 64.2 | 12.1 | 153.97
> 65.1 | 12.1 | 153.97
> 66.1 | 13.0 | 153.14
> 67.1 | 13.0 | 153.14
> 68.2 | 13.0 | 153.14
> 69.1 | 13.0 | 153.14
> 70.1 | 14.0 | 153.112
> (21 rows)
>
> This is good information, because it tells us that major library
> versions change more often than collation versions, empirically-
> speaking.

Wow, nice discovery about 104 -> 14. Yeah, I imagine we'll want some
kind of band-aid to tolerate that exact screwup and avoid spurious
warnings.

Bugs aside, that's quite a revealing table in other ways. We can see:

* The version scheme changed completely in ICU 53. This corresponds
to a major rewrite of the collation code, I see[1].

* The first component seems to be (UCOL_RUNTIME_VERSION << 4) + 9.
UCOL_RUNTIME_VERSION is in their uvernum.h, currently 9, was 8, bumped
between 54 and 55 (I see this in their commit log), corresponding to
the two possible numbers 137 and 153 that we see there. I don't know
where the final 9 term is coming from but it looks stable since the v2
collation rewrite landed.

* The second component seems to be uca_version_major * 8 +
uca_version_minor (that's the Unicode Collation Algorithm version, and
so far always matches the Unicode version, visible in the output of
the other function).

* The values you showed for English don't have a third component, but
if you try some other locales like 'zh' you'll see the CLDR major
version in third position. So I guess some locales depend on CLDR
data and others don't.

TL;DR it *looks* like the set of ingredients for the version string is:

* UCOL_RUNTIME_VERSION (rarely changes)
* UCA/Unicode major.minor version
* sometimes CLDR major version, not sure when
* 9

[1] https://icu.unicode.org/design/collation/v2

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2022-11-29 04:35:55 Re: [PoC] Improve dead tuple storage for lazy vacuum
Previous Message Ajin Cherian 2022-11-29 04:25:35 Re: Support logical replication of DDLs