From: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Built-in case-insensitive collation pg_unicode_ci |
Date: | 2025-10-16 13:46:30 |
Message-ID: | 76d9a422-2e15-4300-9c6d-47a7c3d00caa@eisentraut.org |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 20.09.25 02:21, Jeff Davis wrote:
> New builtin case-insensitive collation PG_UNICODE_CI, where the
> ordering semantics are just:
>
> strcmp(CASEFOLD(arg1), CASEFOLD(arg2))
>
> and the character semantics are the same as PG_UNICODE_FAST.
If it's a variant of PG_UNICODE_FAST, then it ought to be called
PG_UNICODE_FAST_CI or similar. Otherwise, one would expect it to be a
variant of PG_UNICODE (if that existed, but there is also UNICODE).
But that name is also dubious since you later write that it's not
actually fast.
> Non-deterministic collations cannot be used by SIMILAR TO, and may
> cause problems for ILIKE and regexes. The reason is that pattern
> matching often depends on the character-by-character semantics, but ICU
> collations aren't constrained enough for these semantics to work.
This reasoning is a bit narrow. SIMILAR TO is kind of deprecated, and
ILIKE is kind of stupid, and regexes have their own way to control
case-sensitivity.
Nevertheless, I think there would be some value to provide CI (and maybe
accent-insensitive?) collations that operate separately from the
"nondeterministic" mechanism. But then I would like to see a
comprehensive approach that covers a variety of providers and locales.
For example, I would expect there to be something like a "sv_SE_CI"
locale, either available by default or easily created.
From | Date | Subject | |
---|---|---|---|
Next Message | Philip Alger | 2025-10-16 14:20:14 | Re: [PATCH] Add pg_get_trigger_ddl() to retrieve the CREATE TRIGGER statement |
Previous Message | Daniel Gustafsson | 2025-10-16 13:36:18 | Re: doc: create table improvements |