Re: Collation version tracking for macOS

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: Jeremy Schneider <schneider(at)ardentperf(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, "Nasby, Jim" <nasbyj(at)amazon(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation version tracking for macOS
Date: 2022-10-22 01:22:03
Message-ID: CA+hUKGKq=iLH3bY+nK7v8b2zBCuKOk-fe0cP0it2RxNaWFVxYA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 22, 2022 at 10:24 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> ... But it
> doesn't provide a way for me to create a new database that uses 63 on
> purpose when I know what I'm doing. There are various reasons I might
> want to do that.

Thinking some more about this, I guess that could be addressed by
having an explicit way to request either the library version or
collversion-style version when creating a database or collation, but
not actually storing it in daticulocale/colliculocale. That could be
done either as part of the string that is trimmed off before storing
it (so it's only used briefly during creation to find a non-default
library)... Perhaps that'd look like initdb --icu-locale "67:en" (ICU
library version) or "154.14:en" (individual collation version) or some
new syntax in a few places. Thereafter, it would always be looked up
by searching for the right library by [dat]collversion as Peter E
suggested.

Let me try harder to vocalise some more thoughts that have stopped me
from trying to code the search-by-collversion design so far:

Suppose your pgdata encounters a PostgreSQL linked against a later ICU
library, most likely after an OS upgrade or migratoin, a pg_upgrade,
or via streaming replication. You might get a new error "can't find
ICU collation 'en' with version '153.14'; HINT: install missing ICU
library version", and somehow you'll have to work out which one might
contain 'en' v153.14 and install it with apt-get etc. Then it'll
magically work: your postgres linked against (say) 71 will happily
work with the dlopen'd 67. This is enough if you want to stay on 67
until the heat death of the universe. So far so good.

Problem 1: Suppose you're ready to start using (say) v72. I guess
you'd use the REFRESH command, which would open the main linked ICU's
collversion and stamp that into the catalogue, at which point new
sessions would start using that, and then you'd have to rebuild all
your indexes (with no help from PG to tell you how to find everything
that needs to be rebuilt, as belaboured in previous reverted work).
Aside from the possibility of getting the rebuilding job wrong (as
belaboured elsewhere), it's not great, because there is still a
transitional period where you can be using the wrong version for your
data. So this requires some careful planning and understanding from
the administrator.

I admit that the upgrade story is a tiny bit better than the v5
DB2-style patch, which starts using the new version immediately if you
didn't use a prefix (and logs the usual warnings about collversion
mismatch) instead of waiting for you to run REFRESH. But both of them
have a phase where they might use the wrong library to access an
index. That's dissatisfying, and leads me to prefer the simple
DB2-style solution that at least admits up front that it's not very
clever. The DB2-style patch could be improved a bit here with the
addition of one more GUC: default_icu_library, so the administrator,
rather than the packager, remains in control of which version we use
for non-prefixed iculocale values (likely to be what almost everyone
is interested in), defaulting to what the packager linked against.
I've added that to the patch for illustration (though obviously the
error messages produced by collversion mismatch could use some
adjustment, ie to clarify that the warning might be cleared by
installing and selecting a different library version).

Problem 2: If ICU 67 ever decides to report a different version for a
given collation (would it ever do that? I don't expect so, but ...),
we'd be unable to open the collation with the search-by-collversion
design, and potentially the database. What is a user supposed to do
then? Presumably our error/hint for that would be "please insert the
correct ICU library into drive A", but now there is no correct
library; if you can even diagnose what's happened, I guess you might
downgrade the ICU library using package tools or whatever if possible,
but otherwise you'd be stuck, if you just can't get the right library.
Is this a problem? Would you want to be able to say "I don't care,
computer, please just press on"? So I think we need a way to turn off
the search-by-collversion thing. How should it look?

I'd love to hear others' thoughts on how we can turn this into a
workable solution. Hopefully while staying simple...

Attachment Content-Type Size
v6-0001-WIP-Multi-version-ICU.patch text/x-patch 32.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhang Mingli 2022-10-22 03:21:55 Re: Add 64-bit XIDs into PostgreSQL 15
Previous Message Peter Geoghegan 2022-10-22 00:39:55 Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation