Re: insensitive collations

From: Jim Finnerty <jfinnert(at)amazon(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: insensitive collations
Date: 2021-04-03 18:47:46
Message-ID: 1617475666203-0.post@n3.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

My previous post had a flaw, but fixing that flaw led me to what may be a
bug?

Using column COLLATE "C" ILIKE pattern almost does what you'd like, but the
single-character wildcard is treated as a single byte with "C", and that
won't match a multi-byte character. The fix, I thought, would be to use a
deterministic CS_AS ICU collation, since we can handle the per-character
advance correctly in that case (we think). Well, maybe not. It looks like
single-character wildcards using a deterministic ICU collation match a
single byte instead of a single character:

This creates a deterministic case-sensitive, accent-sensitive collation in a
utf8-encoded database:

SET client_encoding = WIN1252;
CREATE COLLATION CS_AS (
provider = icu,
locale = 'utf8(at)colStrength=secondary;colCaseLevel=yes',
deterministic = true);

CREATE TABLE locations (location VARCHAR(255) COLLATE CS_AS);
CREATE INDEX location_index ON locations (location);

INSERT INTO locations VALUES ('Franche-Comté')
, ('Midi-Pyrénées')
, ('midi-Pyrénées')
, ('midi-Pyrenées')
, ('Brian Bruß')
, ('Brian Bruss')
, ('Steven Sossmix')
, ('Provence-Alpes-Côte d Azur');

postgres=# SELECT * FROM locations WHERE location LIKE 'Franche-Comté';
location
----------------
Franche-Comté
(1 row)

postgres=# SELECT * FROM locations WHERE location LIKE 'Franche-Comt_'; --
is this a bug?
location
----------
(0 rows)

postgres=# SELECT * FROM locations WHERE location LIKE 'Franche-Comt__'; --
the wildcard is applied byte by byte instead of character by character, so
the 2-byte accented character is matched only by 2 '_'s
location
----------------
Franche-Comté
(1 row)

-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chapman Flack 2021-04-03 18:48:53 Re: [PATCH] Implement motd for PostgreSQL
Previous Message Joel Jacobson 2021-04-03 18:24:10 Re: [PATCH] Implement motd for PostgreSQL