Re: Use CASEFOLD() internally rather than LOWER()

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Use CASEFOLD() internally rather than LOWER()
Date: 2026-03-25 21:02:20
Message-ID: 0c21d77497c2316f9f5af143122dd24a81eb40db.camel@j-davis.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2026-03-25 at 07:40 -0700, Mark Dilger wrote:
> pg_trgm appears to be lossy, with recheck logic.  I would think you
> just need to make it give answers which at least include everything
> that a regex would match, and then allow recheck to prune that down. 
> My concern is having pg_trgm give less than all the answers, so that
> after recheck you get fewer results than a seqscan would have
> returned.  Would switching to casefold be strictly broader than
> regex?

I think the precise question would be: "are there any two characters
that lowercase to the same character but do not casefold to the same
character?".

I don't have a counterexample, so perhaps using casefold would still be
fine.

Thoughts? Should we enhance regexes to consider more than two case
variants first, or should we proceed with some of these patches (and/or
a similar change to pg_trgm)?

> Sorry if this misses something discussed upthread.  I'm clearly
> assuming here that you don't mind that such a change necessitates a
> REINDEX. 

That's a concern. It may depend on how big the impact would be -- for
libc I don't think it would matter because lowercasing and casefolding
are the same thing.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2026-03-25 21:12:16 Re: another autovacuum scheduling thread
Previous Message Zsolt Parragi 2026-03-25 20:57:23 Re: SLOPE - Planner optimizations on monotonic expressions.