Re: Collation version tracking for macOS

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, "Nasby, Jim" <nasbyj(at)amazon(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation version tracking for macOS
Date: 2022-11-29 04:48:58
Message-ID: 83faecb4a89dfb5794938e7b4d9f89daf4c5d631.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2022-11-28 at 14:11 -0500, Robert Haas wrote:
> I don't really understand #1 or #5 well enough to have an educated
> opinion, but I do think that #1 seems a bit magical. It hopes that
> the
> combination of a collation name and a datcollversion will be
> sufficient to find exactly one matcing collation in a list of
> provided
> libraries. The advantage of that, as I understand it, is that if you
> do something to your system that causes the number of matches to go
> from one to zero, you can just throw another library on the pile and
> get the number back up to one. Woohoo! But there's a part of me that
> worries: what if the number goes up to two, and they're not all the
> same? Probably that's something that shouldn't happen, but if it does
> then I think there's kind of no way to fix it. With the other
> options,
> if there's some way to jigger the catalog state to match what you
> want
> to happen, you can always repair the situation somehow, because the
> library to be used for each collation is explicitly specified in some
> way, and you just have to get it to match what you want to have
> happen.

Not necessarily, #2-4 (at least as implemented in v7) can only load one
major version at a time, so can't specify minor versions:
https://www.postgresql.org/message-id/9f8e9b5a3352478d4cf7d6c0a5dd7e82496be4b6.camel@j-davis.com

With #1, you can provide control over the search order to find the
symbol you want. Granted, if you want to specify that different
collations look in different libraries for the same version, then it
won't work, because the search order is global -- is that what you're
worried about? If so, I think we need to compare it against the
downsides of #2-4, which in my opinion are more serious.

The first thing to sort out with options #2-4 is: what about minor
versions? V7 took the approach that only the major version matters.
That means that if you want to select a specific minor version, then
you are out of luck, because only one major at a time can be loaded,
globally. But paying attention to minor versions seems like a mess --
we'd need even more magical fallbacks that try later minor versions or
something.

Second, there is weirdness in the common case that a collation version
doesn't change between versions. Let's say you have a collation
"mycoll" with locale "en_US" and it's pointed at built-in library
version 64, with collation version 153.97. GUC
default_icu_library_version is set to 63. Then you upgrade the system
and ICU gets updated from 64 -> 65. Now, it can't find version 64 to
load, so it falls back to 63 (which has the wrong version 153.88), even
though 65 is just fine because it still offers that locale with version
153.97. (A similar problem exists when you remove a version of ICU from
icu_library_path, and another version suffices for all of your
collations.)

Thirdly, as I said earlier, it's just hard on the user to try to sort
out two different versions modeled in the database. Understanding
encodings and collations are hard enough, and then we introduce *two*
versions on top of that.

Fourth, I don't see what the point of ucol_getVersion() is in schemes
#2-4. All it does is control a WARNING, because throwing an error (at
least by default) would be too harsh, given that users have lived with
these risks for so long. But if all it does is throw a warning, what's
the point in modeling it in the catalog as though it's the most
important version?

Ultimately, I think collation version (as reported by
ucol_getVersion()) is the most accurate and least-surprising way to
match a library-provided collation with the collation in the catalog.
And it seems like we'd be using it in exactly the way the ICU
maintainers intend it to be used.

Of course, I cast my vote for #1 before I discovered this ICU bug
here: 
https://www.postgresql.org/message-id/0f7922d4f411376f420ec9139febeae4cdc748a6.camel@j-davis.com

That injects some doubt, to be sure. If I were to try to solve the
problems with #2-4, one approach might be to treat the built-in ICU
version differently from the ones in icu_library_path. Not quite sure,
I'd have to think more. But as of now, I'd still lean toward #1 until a
better option is presented.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message rajesh singarapu 2022-11-29 05:05:32 Re: Support logical replication of DDLs
Previous Message houzj.fnst@fujitsu.com 2022-11-29 04:48:28 RE: Perform streaming logical transactions by background workers and parallel apply