From: | Jim Finnerty <jfinnert(at)amazon(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: insensitive collations |
Date: | 2021-03-24 20:41:38 |
Message-ID: | 1616618498263-0.post@n3.nabble.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
For a UTF8 encoded, case-insensitive (nondeterministic), accent-sensitive ICU
collation, a LIKE predicate can be used with a small transformation of the
predicate, and the pattern can contain multi-byte characters:
from:
SELECT * FROM locations WHERE location LIKE 'midi-Pyrené%';
-- ERROR: nondeterministic collations are not supported for LIKE
to:
SELECT * FROM locations WHERE lower(location) COLLATE "C" LIKE
lower('midi-Pyrené%');
and if there is an index defined as follows:
CREATE INDEX location_LOWER_C_index ON
locations (LOWER(location) COLLATE "C");
then the LIKE predicate above performs the desired CI_AS evaluation and also
exploits the index:
EXPLAIN VERBOSE SELECT * FROM locations WHERE LOWER(location) COLLATE "C"
LIKE LOWER('midi-Pyrené%');
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using location_LOWER_C_index on public.locations
(cost=0.13..8.16 rows=1 width=18)
Output: location
Index Cond: (((lower((locations.location)::text))::text >=
'midi-pyren?'::text) AND ((lower((locations.location)::text))::text <
'midi-pyren?'::text))
Filter: ((lower((locations.location)::text))::text ~~
'midi-pyren?%'::text)
(4 rows)
It turns out that CI_AS represents the vast majority (more than 99.9%) of
nondeterministic collations that we are seeing in babelfish, because
SQL_Latin1_General_CP1_CI_AS is the default collation in SQL Server.
Currently nondeterministic collations are disabled at the database level.
The cited reason was because of the lack of LIKE support and because certain
catalog views use LIKE. That may still need to be a limitation if those
LIKE predicates currently have an index exploitation unless we can create a
LOWER(col) COLLATE "C" index on the catalog. Which catalog views were these
that had the problem?
-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2021-03-24 20:46:45 | Re: pg_amcheck contrib application |
Previous Message | Peter Eisentraut | 2021-03-24 19:50:59 | Re: truncating timestamps on arbitrary intervals |