| From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
|---|---|
| To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
| Cc: | pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Use CASEFOLD() internally rather than LOWER() |
| Date: | 2026-01-13 02:14:02 |
| Message-ID: | 1A46D941-E0A4-4B3E-AAEA-1F7B6CCD24E6@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On Jan 13, 2026, at 02:22, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> There are a number of internal callers of LOWER(), and conceptually
> those should all be using CASEFOLD(). Patches attached.
>
> I'm not sure if we want the citext patch -- it would require REINDEX of
> all existing citext indexes after upgrade, and there's already a
> documented tip ("Consider using nondeterministic collations...), so
> perhaps it's a legacy extension anyway.
>
> It would be nice to make the tsearch change this release, as there are
> already changes that could require a reindex.
>
> I didn't change pg_trgm yet, because I think that we have to change the
> regex machinery to be aware of more than two case variants first (and
> potentially increasing string lengths, too).
>
> Regards,
> Jeff Davis
>
>
> <v1-0001-ILIKE-use-CASEFOLD-rather-than-LOWER.patch><v1-0002-citext-use-CASEFOLD-rather-than-LOWER.patch><v1-0003-dict_xsyn-use-CASEFOLD-rather-than-LOWER.patch><v1-0004-tsearch-use-CASEFOLD-rather-than-LOWER.patch>
Hi Jeff,
Thanks for the patch. I have reviewed the patch set and got a few comments for tests:
1 - 0001
```
+SELECT U&'straße' ILIKE U&'STRASSE' COLLATE PG_C_UTF8;
```
Do we want to added one more test:
```
SELECT U&'straße' ILIKE U&'STRASSE' COLLATE PG_UNICODE_FAST;
?column?
----------
t
(1 row)
```
Which tests the different behaviors against collations.
2 - 0002
Do we need to add a test:
```
SELECT 'straße'::citext = 'STRASSE'::citext;
?column?
----------
f
(1 row)
```
I initially thought to add test cases with different collations, but after debugging, I found that citext intentionally ignores specified collation.
3 - 0003 LGTM. Seems the existing test coverage is good enough.
4 - 0004
I thought to suggest add a test:
```
SELECT to_tsvector('straße') @@ to_tsquery('strasse');
?column?
----------
f
(1 row)
```
But I don’t see existing tests under backend/tsearch. So, I’m now not sure whether or not to insist the suggestion.
BWT, while reviewing this patch, I noticed a copy-paste error in str_casefold():
```
errmsg("could not determine which collation to use for %s function",
- "lower()"),
+ "casefold()”),
```
I have posted a patch to fix. See [1].
[1] https://postgr.es/m/CAEoWx2mMmm9fTZYgE-r_T-KPTFR1rKO029QV-S-6n=7US_9EMA@mail.gmail.com
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Steven Niu | 2026-01-13 02:16:29 | Re: str_casefold: fix typo in error message |
| Previous Message | Chao Li | 2026-01-13 02:09:06 | str_casefold: fix typo in error message |