Re: ICU for global collation

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Peter Eisentraut" <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ICU for global collation
Date: 2022-01-10 11:49:07
Message-ID: 809c0fba-b0e6-44bf-b110-0886cb462aab@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Eisentraut wrote:

> Unlike in the previous patch, where the ICU
> collation name was written in datcollate, there is now a third column
> (daticucoll), so we can store all three values.

I think some users would want their db-wide ICU collation to be
case/accent-insensitive. Postgres users are trained to expect
case-sensitive comparisons, but some apps initially made for
e.g. MySQL or MS-SQL that use such collations natively would be easier
to port to Postgres.
IIRC, that was the context for some questions where people were
enquiring about db-wide ICU collations.

With the current patch, it's not possible, AFAICS, because the user
can't tell that the collation is non-deterministic. Presumably this
would require another option to CREATE DATABASE and another
column to store that bit of information.

The "daticucol" column also suggests that we don't expect to add
other collation providers in the future. Maybe a pair of columns like
(datcollprovider, datcolllocale) would be more future-proof,
or a (datcollprovider, datcolllocale, datcollisdeterministic)
triplet if non-deterministic collations are allowed.

Also, pg_collation has "collversion" to detect a mismatch between
the ICU runtime and existing indexes. I don't see that field
for the db-wide ICU collation, so maybe we currently miss the capability
to detect the mismatch for the db-wide collation?

The lack of these fields overall suggest the idea that when CREATE
DATABASE is called with a global ICU collation, what if it somehow
inserted the collation into pg_collation in the new database?
Then pg_database would just store the collation oid and no other
collation-related field would need to be added into it, now
or in the future.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-01-10 11:50:16 Re: Skipping logical replication transactions on subscriber side
Previous Message Thomas Munro 2022-01-10 11:29:44 Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?