Quick Links

Re: insensitive collations

From:	Jim Finnerty <jfinnert(at)amazon(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: insensitive collations
Date:	2021-03-23 19:46:04
Message-ID:	1616528764665-0.post@n3.nabble.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Has any progress been made on supporting LIKE for nondeterministic
collations?

The pattern as well as the expresion needs to use collation-aware character
comparisons, so for a suitable collation where ß compares equally to ss:

SELECT * from table WHERE name LIKE '%ß%'
yields
Brian Bruß
Steven Sossmix

and even if the pattern contains only single-byte UTF-8 characters, a
non-accented character in the first 127 might compare equally to a two-byte
accented character in the first argument, so the comparisons as well as the
character-advancing logic must be collation-aware. This seems to imply that
for the general nondeterministic case we need to rewrite the algorithm to
use ICU functions for advancing to the next character and for comparing
characters at the current position in the pattern and string. Is that
accurate?

for a database with UTF8 encoding and a collation that is case-insenstitive
but accent-sensitive, and where the pattern contains only single-byte
characters or wildcard characters, would LIKE and ILIKE be correct with the
current per-byte implementation - albeit without any index exploitation?

-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

In response to

Re: insensitive collations at 2019-01-30 15:30:54 from Daniel Verite

Responses

Re: insensitive collations at 2021-03-24 20:41:38 from Jim Finnerty

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2021-03-23 19:53:35	Re: pg_amcheck contrib application
Previous Message	Tom Lane	2021-03-23 19:44:48	Re: pg_amcheck contrib application