From: | Vik Fearing <vik(at)postgresfriends(dot)org> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter(at)eisentraut(dot)org> |
Subject: | Re: Add CASEFOLD() function. |
Date: | 2025-06-18 17:09:04 |
Message-ID: | 692d28a2-d5f8-4db5-a3ad-c7db9bab522f@postgresfriends.org |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 17/06/2025 20:14, Jeff Davis wrote:
> On Tue, 2025-06-17 at 17:37 +0200, Vik Fearing wrote:
>> If the character set of <character factor> is UTF8, UTF16, or UTF32,
>> then FR is replaced by
>> Case:
>> i) If the <search condition> S IS NORMALIZED evaluates to
>> True, then NORMALIZE (FR)
>> ii) Otherwise, FR.
> I read that as "if the input is normalized, then the output should be
> normalized", IOW preserve the normalization. But does it mean "preserve
> whatever the input normal form is" or "preserve NFC if the input is
> NFC, otherwise the normalization is undefined"?
>
> The above wording seems to mean "preserve NFC if the input is NFC",
> because that's what NORMALIZE(FR) does when the normal form is
> unspecified.
Yes, and that is also the default for <normalized predicate>.
>> It does not appear to me that our LOWER and UPPER functions obey this
>> rule,
> You are correct:
>
> WITH s(t) AS
> (SELECT NORMALIZE(U&'\00C1\00DF\0301' COLLATE "en-US-x-icu"))
> SELECT UPPER(t) = NORMALIZE(UPPER(t)) FROM s;
> ?column?
> ----------
> f
>
>> so there is a valid argument that we should continue to ignore it.
>> Or, we can say that we have at least one of three compliant.
> What do other databases do?
I don't know. I am just pointing out what the Standard says. I think
we should either comply, or say that we don't do it for LOWER and UPPER
so let's keep things implementation-consistent.
> Given how costly normalization can be, imposing that on every caller
> seems like a bit much.
How much does it cost to check for NFC? I honestly don't know the
answer to that question, but that is the only case where we need to
maintain normalization.
> And favoring NFC for the user unconditionally
> might not be the best thing. Then again, NFC is good most of the time,
> and there are patches to speed up normalization.
It's not unconditionally, it's only if the input was NFC.
> I tend to think that a lot of users who want casefolding would also
> want normalization, but it's hard to weigh that against the performance
> cost. It might not matter outside of a few edge cases, though I'm not
> sure exactly how many.
I defer to you and others in the thread to make this decision.
--
Vik Fearing
From | Date | Subject | |
---|---|---|---|
Next Message | Ranier Vilela | 2025-06-18 17:15:50 | Re: Fix copy-and-past thinko (src/interfaces/libpq/fe-cancel.c) |
Previous Message | Robert Haas | 2025-06-18 17:07:32 | Re: minimum Meson version |