From: | Jim Finnerty <jfinnert(at)amazon(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: insensitive collations |
Date: | 2021-03-25 13:09:49 |
Message-ID: | 1616677789675-0.post@n3.nabble.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
The two ideas can be combined to create a workaround for accent-sensitive
nondeterministic collations that enables an ordinary btree to be exploited
if available, and also provides the full LIKE logic in either case-sensitive
or case-insensitive collations:
SELECT * FROM locations WHERE location LIKE 'midi-Pyrén%ées';
becomes:
SELECT * FROM locations
WHERE
location COLLATE "C" ILIKE 'midi-Pyrén%ées' AND
-- For CI collations only
location BETWEEN 'midi-Pyrén' AND 'midi-Pyrén' || E'\uFFFF'; --
exploitable by ordinary btree
LIKE would also be valid for a case sensitive collation, but then the
collation would be CS_AS, so it would be deterministic and no transform
would be needed.
The expression above produces a good plan, but EXPLAIN complains if the
concatenated expression is not a valid character for the current
client_encoding, which I had set to WIN1252 to display the accented
characters properly on the client:
babel=# SELECT * FROM locations
babel-# WHERE
babel-# location COLLATE "C" ILIKE 'midi-Pyrén%ées' AND
-- For CI collations only
babel-# location BETWEEN 'midi-Pyrén' AND 'midi-Pyrén' || E'\uFFFF';
-- exploitable by ordinary btree
location
-----------------
Midi-Pyrénées
midi-Pyrénées
(2 rows)
babel=# EXPLAIN VERBOSE SELECT * FROM locations
babel-# WHERE
babel-# location COLLATE "C" ILIKE 'midi-Pyrén%ées' AND
-- For CI collations only
babel-# location BETWEEN 'midi-Pyrén' AND 'midi-Pyrén' || E'\uFFFF';
-- exploitable by ordinary btree
ERROR: character with byte sequence 0xef 0xbf 0xbf in encoding "UTF8" has
no equivalent in encoding "WIN1252"
reset client_encoding;
babel=# EXPLAIN VERBOSE SELECT * FROM locations
WHERE
location COLLATE "C" ILIKE 'midi-Pyrén%ées' AND
-- For CI collations only
location BETWEEN 'midi-Pyrén' AND 'midi-Pyrén' || E'\uFFFF'; --
exploitable by ordinary btree
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Index Only Scan using *location_index* on public.locations
(cost=0.13..8.15 rows=1 width=18)
Output: location
Index Cond: ((locations.location >= 'midi-Pyrén'::text) AND
(locations.location <= 'midi-Pyrén'::text))
Filter: ((locations.location)::text ~~* 'midi-Pyrén%ées'::text)
(4 rows)
-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2021-03-25 13:14:51 | Re: Is it useful to record whether plans are generic or custom? |
Previous Message | Fujii Masao | 2021-03-25 13:06:32 | Re: About to add WAL write/fsync statistics to pg_stat_wal view |