Quick Links

Re: Use CASEFOLD() internally rather than LOWER()

From:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Daniel Verite <daniel(at)manitou-mail(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Use CASEFOLD() internally rather than LOWER()
Date:	2026-03-22 03:14:37
Message-ID:	CAHgHdKt+_+QhHK8WXQSoMNeUz43Cp2zGNEVX6=0RSaksA9zyJw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Mar 3, 2026 at 1:01 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> On Sat, 2026-02-28 at 14:27 +0100, Daniel Verite wrote:
> > I tried 0001 with a non-UTF8 database and got quickly stuck:
>
> Attached new versions. I moved the encoding check into the SQL-callable
> casefold() function, and other callers use str_casefold(). That
> slightly simplifies what happens in ILIKE, also.
>
> I removed the citext changes. citext has somewhat of a legacy status, I
> think, so I'm not sure it makes sense to try to modernize or change it.
> Also, some SQL-language functions in citext use LOWER(), so the changes
> aren't enough: we'd need to make the SQL CASEFOLD function callable in
> other encodings, and also run a citext upgrade script to change the
> definitions.
>
> Note that these changes affect the result of some expressions (e.g.
> ILIKE), so could theoretically make an expression index or predicate
> index inconsistent.
>

Thanks for the patches!

After v2-0001, ILIKE uses str_casefold() for matching, but pg_trgm still
uses str_tolower() for trigram extraction (trgm_op.c:352 and :948).
With builtin collations, these produce different results.

Attachment	Content-Type	Size
WIP-v3-0001-Demonstrate-inconsistency-in-gin-index-vs-seq-sca.patch-WIP	application/octet-stream	12.1 KB

In response to

Re: Use CASEFOLD() internally rather than LOWER() at 2026-03-03 21:01:48 from Jeff Davis

Responses

Re: Use CASEFOLD() internally rather than LOWER() at 2026-03-24 23:07:51 from Jeff Davis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2026-03-22 04:24:47	Re: pg_waldump: support decoding of WAL inside tarfile
Previous Message	John Naylor	2026-03-22 02:14:09	Re: Add RISC-V Zbb popcount optimization