Re: Collation version tracking for macOS

From: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Joe Conway <mail(at)joeconway(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Nasby, Jim" <nasbyj(at)amazon(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation version tracking for macOS
Date: 2023-01-10 09:57:36
Message-ID: 0e1bcd64-b32b-a9ef-4d65-fe420d10e5b3@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 05.12.22 22:33, Thomas Munro wrote:
> On Tue, Dec 6, 2022 at 6:45 AM Joe Conway <mail(at)joeconway(dot)com> wrote:
>> On 12/5/22 12:41, Jeff Davis wrote:
>>> On Mon, 2022-12-05 at 16:12 +1300, Thomas Munro wrote:
>>>> 1. I think we should seriously consider provider = ICU63. I still
>>>> think search-by-collversion is a little too magical, even though it
>>>> clearly can be made to work. Of the non-magical systems, I think
>>>> encoding the choice of library into the provider name would avoid the
>>>> need to add a second confusing "X_version" concept alongside our
>>>> existing "X_version" columns in catalogues and DDL syntax, while
>>>> still
>>>> making it super clear what is going on.
>>>
>>> As I understand it, this is #2 in your previous list?
>>>
>>> Can we put the naming of the provider into the hands of the user, e.g.:
>>>
>>> CREATE COLLATION PROVIDER icu63 TYPE icu
>>> AS '/path/to/libicui18n.so.63', '/path/to/libicuuc.so.63';
>>>
>>> In this model, icu would be a "provider kind" and icu63 would be the
>>> specific provider, which is named by the user.
>>>
>>> That seems like the least magical approach, to me. We need an ICU
>>> library; the administrator gives us one that looks like ICU; and we're
>>> happy.
>>
>> +1
>>
>> I like this. The provider kind defines which path we take in our code,
>> and the specific library unambiguously defines a specific collation
>> behavior (I think, ignoring bugs?)
>
> OK, I'm going to see what happens if I try to wrangle that stuff into
> a new catalogue table.

I'm reviewing the commit fest entry
https://commitfest.postgresql.org/41/3956/, which points to this thread.
It appears that the above patch did not come about in time. The patch
of record is now Jeff's refactoring patch, which is also tracked in
another commit fest entry (https://commitfest.postgresql.org/41/4058/).
So as a matter of procedure, we should probably close this commit fest
entry for now. (Maybe we should also use a different thread subject in
the future.)

I have a few quick comments on the above syntax example:

There is currently a bunch of locale-using code that selects different
code paths by "collation provider", i.e., a libc-based code path and an
ICU-based code path (and sometimes also a default provider path). The
above proposal would shift the terminology and would probably require
some churn at those sites, in that they would now have to select by
"collation provider type". We could probably avoid that by shifting the
terms a bit, so instead of the suggested

provider type -> provider

we could use

provider -> version of that provider

(or some other actual term), which would leave the meaning of "provider"
unchanged as far as locale-using code is concerned. At least that's my
expectation, since no code for this has been seen yet. We should keep
this in mind in any case.

Also, the above example exposes a lot of operating system level details.
This creates issues with dump/restore, which some of the earlier
patches avoided by using a path-based approach, and it would also
require some thoughts about permissions. We probably want
non-superusers to be able to interact with this system somehow, for
upgrading (for some meaning of that action) indexes etc. without
superuser access. The more stuff from the OS we expose, the more stuff
we have to be able to lock down again in a usable manner.

(The search-by-collversion approach can probably avoid those issues better.)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2023-01-10 10:08:33 Change xl_hash_vacuum_one_page.ntuples from int to uint16
Previous Message Marco Slot 2023-01-10 09:01:25 Re: Exposing the lock manager's WaitForLockers() to SQL