Re: Add CASEFOLD() function.

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Vik Fearing <vik(at)postgresfriends(dot)org>, Joe Conway <mail(at)joeconway(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter(at)eisentraut(dot)org>
Subject: Re: Add CASEFOLD() function.
Date: 2025-06-17 18:14:58
Message-ID: af6b575335ffeea36db3189fe7031adf67c250d2.camel@j-davis.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2025-06-17 at 17:37 +0200, Vik Fearing wrote:
> If the character set of <character factor> is UTF8, UTF16, or UTF32,
> then FR is replaced by
>      Case:
>          i) If the <search condition> S IS NORMALIZED evaluates to
> True, then NORMALIZE (FR)
>          ii) Otherwise, FR.

I read that as "if the input is normalized, then the output should be
normalized", IOW preserve the normalization. But does it mean "preserve
whatever the input normal form is" or "preserve NFC if the input is
NFC, otherwise the normalization is undefined"?

The above wording seems to mean "preserve NFC if the input is NFC",
because that's what NORMALIZE(FR) does when the normal form is
unspecified.

> It does not appear to me that our LOWER and UPPER functions obey this
> rule,

You are correct:

WITH s(t) AS
(SELECT NORMALIZE(U&'\00C1\00DF\0301' COLLATE "en-US-x-icu"))
SELECT UPPER(t) = NORMALIZE(UPPER(t)) FROM s;
?column?
----------
f

> so there is a valid argument that we should continue to ignore it.
> Or, we can say that we have at least one of three compliant.

What do other databases do?

Given how costly normalization can be, imposing that on every caller
seems like a bit much. And favoring NFC for the user unconditionally
might not be the best thing. Then again, NFC is good most of the time,
and there are patches to speed up normalization.

I tend to think that a lot of users who want casefolding would also
want normalization, but it's hard to weigh that against the performance
cost. It might not matter outside of a few edge cases, though I'm not
sure exactly how many.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2025-06-17 18:19:55 Re: pg_recvlogical cannot create slots with failover=true
Previous Message Tom Lane 2025-06-17 17:48:33 Re: minimum Meson version