Re: Use CASEFOLD() internally rather than LOWER()

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Use CASEFOLD() internally rather than LOWER()
Date: 2026-03-25 14:40:23
Message-ID: CAHgHdKtb2jD+DaTJU+3jnQRZ9hEXSDcPCR8DCCzZTTVeo4jQcA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 24, 2026 at 4:07 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> On Sat, 2026-03-21 at 20:14 -0700, Mark Dilger wrote:
> > After v2-0001, ILIKE uses str_casefold() for matching, but pg_trgm
> > still
> > uses str_tolower() for trigram extraction (trgm_op.c:352 and :948).
> > With builtin collations, these produce different results.
>
> Interesting, thank you. As stated in the original message, I was unsure
> about changing pg_trgm without adjusting the regex logic, also:
>
>
> https://www.postgresql.org/message-id/64d7949bad90545f981ac7513fb0b4954daca2c9.camel@j-davis.com
>
> do you have a suggestion about an easy way to do that, or should we
> revisit in the next cycle?
>

pg_trgm appears to be lossy, with recheck logic. I would think you just
need to make it give answers which at least include everything that a regex
would match, and then allow recheck to prune that down. My concern is
having pg_trgm give less than all the answers, so that after recheck you
get fewer results than a seqscan would have returned. Would switching to
casefold be strictly broader than regex? If so, you would just need to
convert pg_trgm to use casefold and then rely on the recheck machinery.

Sorry if this misses something discussed upthread. I'm clearly assuming
here that you don't mind that such a change necessitates a REINDEX.

--

*Mark Dilger*

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2026-03-25 14:46:08 Re: SQL-level pg_datum_image_equal
Previous Message Tomas Vondra 2026-03-25 14:38:01 Re: Test timings are increasing too fast for cfbot