Re: Built-in case-insensitive collation pg_unicode_ci

From: Peter Eisentraut <peter(at)eisentraut(dot)org>
To: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Built-in case-insensitive collation pg_unicode_ci
Date: 2025-10-16 13:46:30
Message-ID: 76d9a422-2e15-4300-9c6d-47a7c3d00caa@eisentraut.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 20.09.25 02:21, Jeff Davis wrote:
> New builtin case-insensitive collation PG_UNICODE_CI, where the
> ordering semantics are just:
>
> strcmp(CASEFOLD(arg1), CASEFOLD(arg2))
>
> and the character semantics are the same as PG_UNICODE_FAST.

If it's a variant of PG_UNICODE_FAST, then it ought to be called
PG_UNICODE_FAST_CI or similar. Otherwise, one would expect it to be a
variant of PG_UNICODE (if that existed, but there is also UNICODE).

But that name is also dubious since you later write that it's not
actually fast.

> Non-deterministic collations cannot be used by SIMILAR TO, and may
> cause problems for ILIKE and regexes. The reason is that pattern
> matching often depends on the character-by-character semantics, but ICU
> collations aren't constrained enough for these semantics to work.

This reasoning is a bit narrow. SIMILAR TO is kind of deprecated, and
ILIKE is kind of stupid, and regexes have their own way to control
case-sensitivity.

Nevertheless, I think there would be some value to provide CI (and maybe
accent-insensitive?) collations that operate separately from the
"nondeterministic" mechanism. But then I would like to see a
comprehensive approach that covers a variety of providers and locales.
For example, I would expect there to be something like a "sv_SE_CI"
locale, either available by default or easily created.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Philip Alger 2025-10-16 14:20:14 Re: [PATCH] Add pg_get_trigger_ddl() to retrieve the CREATE TRIGGER statement
Previous Message Daniel Gustafsson 2025-10-16 13:36:18 Re: doc: create table improvements