From: | Jim Finnerty <jfinnert(at)amazon(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: insensitive collations |
Date: | 2021-04-03 18:47:46 |
Message-ID: | 1617475666203-0.post@n3.nabble.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
My previous post had a flaw, but fixing that flaw led me to what may be a
bug?
Using column COLLATE "C" ILIKE pattern almost does what you'd like, but the
single-character wildcard is treated as a single byte with "C", and that
won't match a multi-byte character. The fix, I thought, would be to use a
deterministic CS_AS ICU collation, since we can handle the per-character
advance correctly in that case (we think). Well, maybe not. It looks like
single-character wildcards using a deterministic ICU collation match a
single byte instead of a single character:
This creates a deterministic case-sensitive, accent-sensitive collation in a
utf8-encoded database:
SET client_encoding = WIN1252;
CREATE COLLATION CS_AS (
provider = icu,
locale = 'utf8(at)colStrength=secondary;colCaseLevel=yes',
deterministic = true);
CREATE TABLE locations (location VARCHAR(255) COLLATE CS_AS);
CREATE INDEX location_index ON locations (location);
INSERT INTO locations VALUES ('Franche-Comté')
, ('Midi-Pyrénées')
, ('midi-Pyrénées')
, ('midi-Pyrenées')
, ('Brian Bruß')
, ('Brian Bruss')
, ('Steven Sossmix')
, ('Provence-Alpes-Côte d Azur');
postgres=# SELECT * FROM locations WHERE location LIKE 'Franche-Comté';
location
----------------
Franche-Comté
(1 row)
postgres=# SELECT * FROM locations WHERE location LIKE 'Franche-Comt_'; --
is this a bug?
location
----------
(0 rows)
postgres=# SELECT * FROM locations WHERE location LIKE 'Franche-Comt__'; --
the wildcard is applied byte by byte instead of character by character, so
the 2-byte accented character is matched only by 2 '_'s
location
----------------
Franche-Comté
(1 row)
-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
From | Date | Subject | |
---|---|---|---|
Next Message | Chapman Flack | 2021-04-03 18:48:53 | Re: [PATCH] Implement motd for PostgreSQL |
Previous Message | Joel Jacobson | 2021-04-03 18:24:10 | Re: [PATCH] Implement motd for PostgreSQL |