Re: Use CASEFOLD() internally rather than LOWER()

From: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Use CASEFOLD() internally rather than LOWER()
Date: 2026-01-13 02:14:02
Message-ID: 1A46D941-E0A4-4B3E-AAEA-1F7B6CCD24E6@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Jan 13, 2026, at 02:22, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> There are a number of internal callers of LOWER(), and conceptually
> those should all be using CASEFOLD(). Patches attached.
>
> I'm not sure if we want the citext patch -- it would require REINDEX of
> all existing citext indexes after upgrade, and there's already a
> documented tip ("Consider using nondeterministic collations...), so
> perhaps it's a legacy extension anyway.
>
> It would be nice to make the tsearch change this release, as there are
> already changes that could require a reindex.
>
> I didn't change pg_trgm yet, because I think that we have to change the
> regex machinery to be aware of more than two case variants first (and
> potentially increasing string lengths, too).
>
> Regards,
> Jeff Davis
>
>
> <v1-0001-ILIKE-use-CASEFOLD-rather-than-LOWER.patch><v1-0002-citext-use-CASEFOLD-rather-than-LOWER.patch><v1-0003-dict_xsyn-use-CASEFOLD-rather-than-LOWER.patch><v1-0004-tsearch-use-CASEFOLD-rather-than-LOWER.patch>

Hi Jeff,

Thanks for the patch. I have reviewed the patch set and got a few comments for tests:

1 - 0001
```
+SELECT U&'straße' ILIKE U&'STRASSE' COLLATE PG_C_UTF8;
```

Do we want to added one more test:
```
SELECT U&'straße' ILIKE U&'STRASSE' COLLATE PG_UNICODE_FAST;
?column?
----------
t
(1 row)
```
Which tests the different behaviors against collations.

2 - 0002

Do we need to add a test:
```
SELECT 'straße'::citext = 'STRASSE'::citext;
?column?
----------
f
(1 row)
```

I initially thought to add test cases with different collations, but after debugging, I found that citext intentionally ignores specified collation.

3 - 0003 LGTM. Seems the existing test coverage is good enough.

4 - 0004

I thought to suggest add a test:
```
SELECT to_tsvector('straße') @@ to_tsquery('strasse');
?column?
----------
f
(1 row)
```

But I don’t see existing tests under backend/tsearch. So, I’m now not sure whether or not to insist the suggestion.

BWT, while reviewing this patch, I noticed a copy-paste error in str_casefold():
```
errmsg("could not determine which collation to use for %s function",
- "lower()"),
+ "casefold()”),
```

I have posted a patch to fix. See [1].

[1] https://postgr.es/m/CAEoWx2mMmm9fTZYgE-r_T-KPTFR1rKO029QV-S-6n=7US_9EMA@mail.gmail.com

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Steven Niu 2026-01-13 02:16:29 Re: str_casefold: fix typo in error message
Previous Message Chao Li 2026-01-13 02:09:06 str_casefold: fix typo in error message