From: | Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Built-in case-insensitive collation pg_unicode_ci |
Date: | 2025-09-24 15:10:45 |
Message-ID: | f3b42d3ccef71f431f3c8ea436422f3b87867527.camel@cybertec.at |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 2025-09-19 at 17:21 -0700, Jeff Davis wrote:
> --------
> Proposal
> --------
>
> New builtin case-insensitive collation PG_UNICODE_CI, where the
> ordering semantics are just:
>
> strcmp(CASEFOLD(arg1), CASEFOLD(arg2))
>
> and the character semantics are the same as PG_UNICODE_FAST.
I think that this is interesting.
> ----------
> Motivation
> ----------
>
> Non-deterministic collations cannot be used by SIMILAR TO, and may
> cause problems for ILIKE and regexes. The reason is that pattern
> matching often depends on the character-by-character semantics, but ICU
> collations aren't constrained enough for these semantics to work. See:
>
> However, PG_UNICODE_CI collation does have character-by-character
> semantics which are well-defined for pattern matching.
>
> That takes us a step closer to allowing the database default collation
> to be case-insensitive.
What is still missing for that? Pattern matching?
> ----------
> Versioning
> ----------
>
> Unlike other built-in collations, the order does depend on the version
> of Unicode, so the collation is given a version equal to the version of
> Unicode. (Other builtin collations have a version of "1".)
>
> That means that indexes, including primary keys, can become
> inconsistent after a major version upgrade if the version of Unicode
> has changed. The conditions where this can happen are much narrower
> than with libc or ICU collations:
>
> (a) The database in the prior version must contain code points
> unassigned as of that version; and
> (b) Some of those previously-unassigned code points must be assigned
> to a Cased character in the newer version.
That's an improvement for people who are ready to perform a test upgrade
and check if any indexes are corrupted - they will likely see that none
are, so no index needs to be rebuilt.
I tried your patch.
It works as advertised, and I didn't manage to break it.
Yours,
Laurenz Albe
From | Date | Subject | |
---|---|---|---|
Next Message | Christoph Berg | 2025-09-24 15:13:44 | Re: "openssl" should not be optional |
Previous Message | Peter Eisentraut | 2025-09-24 15:05:02 | Re: Remove PointerIsValid() |