Re: insensitive collations

From: Jim Finnerty <jfinnert(at)amazon(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: insensitive collations
Date: 2021-03-25 13:09:49
Message-ID: 1616677789675-0.post@n3.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The two ideas can be combined to create a workaround for accent-sensitive
nondeterministic collations that enables an ordinary btree to be exploited
if available, and also provides the full LIKE logic in either case-sensitive
or case-insensitive collations:

SELECT * FROM locations WHERE location LIKE 'midi-Pyrén%ées';

becomes:

SELECT * FROM locations
WHERE
location COLLATE "C" ILIKE 'midi-Pyrén%ées' AND
-- For CI collations only
location BETWEEN 'midi-Pyrén' AND 'midi-Pyrén' || E'\uFFFF'; --
exploitable by ordinary btree

LIKE would also be valid for a case sensitive collation, but then the
collation would be CS_AS, so it would be deterministic and no transform
would be needed.

The expression above produces a good plan, but EXPLAIN complains if the
concatenated expression is not a valid character for the current
client_encoding, which I had set to WIN1252 to display the accented
characters properly on the client:

babel=# SELECT * FROM locations
babel-# WHERE
babel-# location COLLATE "C" ILIKE 'midi-Pyrén%ées' AND
-- For CI collations only
babel-# location BETWEEN 'midi-Pyrén' AND 'midi-Pyrén' || E'\uFFFF';
-- exploitable by ordinary btree
location
-----------------
Midi-Pyrénées
midi-Pyrénées
(2 rows)

babel=# EXPLAIN VERBOSE SELECT * FROM locations
babel-# WHERE
babel-# location COLLATE "C" ILIKE 'midi-Pyrén%ées' AND
-- For CI collations only
babel-# location BETWEEN 'midi-Pyrén' AND 'midi-Pyrén' || E'\uFFFF';
-- exploitable by ordinary btree
ERROR: character with byte sequence 0xef 0xbf 0xbf in encoding "UTF8" has
no equivalent in encoding "WIN1252"

reset client_encoding;

babel=# EXPLAIN VERBOSE SELECT * FROM locations
WHERE
location COLLATE "C" ILIKE 'midi-Pyrén%ées' AND
-- For CI collations only
location BETWEEN 'midi-Pyrén' AND 'midi-Pyrén' || E'\uFFFF'; --
exploitable by ordinary btree
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Index Only Scan using *location_index* on public.locations
(cost=0.13..8.15 rows=1 width=18)
Output: location
Index Cond: ((locations.location >= 'midi-Pyrén'::text) AND
(locations.location <= 'midi-Pyrén￿'::text))
Filter: ((locations.location)::text ~~* 'midi-Pyrén%ées'::text)
(4 rows)

-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2021-03-25 13:14:51 Re: Is it useful to record whether plans are generic or custom?
Previous Message Fujii Masao 2021-03-25 13:06:32 Re: About to add WAL write/fsync statistics to pg_stat_wal view