| From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
|---|---|
| To: | Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> |
| Cc: | Daniel Verite <daniel(at)manitou-mail(dot)org>, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Use CASEFOLD() internally rather than LOWER() |
| Date: | 2026-03-25 21:02:20 |
| Message-ID: | 0c21d77497c2316f9f5af143122dd24a81eb40db.camel@j-davis.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, 2026-03-25 at 07:40 -0700, Mark Dilger wrote:
> pg_trgm appears to be lossy, with recheck logic. I would think you
> just need to make it give answers which at least include everything
> that a regex would match, and then allow recheck to prune that down.
> My concern is having pg_trgm give less than all the answers, so that
> after recheck you get fewer results than a seqscan would have
> returned. Would switching to casefold be strictly broader than
> regex?
I think the precise question would be: "are there any two characters
that lowercase to the same character but do not casefold to the same
character?".
I don't have a counterexample, so perhaps using casefold would still be
fine.
Thoughts? Should we enhance regexes to consider more than two case
variants first, or should we proceed with some of these patches (and/or
a similar change to pg_trgm)?
> Sorry if this misses something discussed upthread. I'm clearly
> assuming here that you don't mind that such a change necessitates a
> REINDEX.
That's a concern. It may depend on how big the impact would be -- for
libc I don't think it would matter because lowercasing and casefolding
are the same thing.
Regards,
Jeff Davis
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bharath Rupireddy | 2026-03-25 21:12:16 | Re: another autovacuum scheduling thread |
| Previous Message | Zsolt Parragi | 2026-03-25 20:57:23 | Re: SLOPE - Planner optimizations on monotonic expressions. |