| From: | Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> |
|---|---|
| To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
| Cc: | Daniel Verite <daniel(at)manitou-mail(dot)org>, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Use CASEFOLD() internally rather than LOWER() |
| Date: | 2026-03-26 00:01:26 |
| Message-ID: | CAHgHdKuGR7aJxZu7VTPA+kEDkzqJvKmi5799rhW+sKyt-WVihQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Mar 25, 2026 at 2:02 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> I think the precise question would be: "are there any two characters
> that lowercase to the same character but do not casefold to the same
> character?".
>
I don't know. I'll set up a test to iterate across all locales across all
character pairs... no, I didn't find any on my system. Other searching
suggests that the Turkish and Azerbaijani locale do have this
characteristic, with I (U+0049) lowercasing to ı (U+0131) and case folding
to i (U+0069) while ı (U+0131) lowercases to ı (U+0131) but also case folds
to ı (U+0131). I have not confirmed that empirically, though.
> I don't have a counterexample, so perhaps using casefold would still be
> fine.
>
> Thoughts? Should we enhance regexes to consider more than two case
> variants first, or should we proceed with some of these patches (and/or
> a similar change to pg_trgm)?
>
I don't want to take a strong position either way. I'm still wrapping my
head around the various implications of the proposed changes, and don't
feel I have a complete picture yet.
--
*Mark Dilger*
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tomas Vondra | 2026-03-26 00:19:03 | Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions |
| Previous Message | Lukas Fittl | 2026-03-25 23:59:06 | Re: pg_plan_advice |