From: | "Daniel Verite" <daniel(at)manitou-mail(dot)org> |
---|---|
To: | "Jim Finnerty" <jfinnert(at)amazon(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: insensitive collations |
Date: | 2021-03-25 12:13:25 |
Message-ID: | 54ef5bfe-043e-4328-9d70-5818789838f4@manitou-mail.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Jim Finnerty wrote:
> For a UTF8 encoded, case-insensitive (nondeterministic), accent-sensitive
> ICU
> collation, a LIKE predicate can be used with a small transformation of the
> predicate, and the pattern can contain multi-byte characters:
>
> from:
>
> SELECT * FROM locations WHERE location LIKE 'midi-Pyrené%';
> -- ERROR: nondeterministic collations are not supported for LIKE
>
> to:
>
> SELECT * FROM locations WHERE lower(location) COLLATE "C" LIKE
> lower('midi-Pyrené%');
For prefix matching, there's a simpler way with non-deterministic
collations based on the advice in [1]
The trick is that if an ICU collation is assigned to "location",
whether it's deterministic or not,
SELECT * FROM locations WHERE location LIKE 'midi-Pyrené%';
is equivalent to:
SELECT * FROM locations WHERE location BETWEEN
'midi-Pyrené' AND 'midi-Pyrené' || E'\uFFFF';
and that will use a btree index if available.
Also, it works with all features of ND-collations and all encodings, not
just case-insensitiveness and UTF-8.
Now that doesn't solve LIKE '%midi-Pyrené%', or LIKE '%midi_Pyrené%',
but that trick could be a building block for an algorithm implementing
LIKE with ND-collations in the future.
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2021-03-25 12:23:17 | Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested. |
Previous Message | David Steele | 2021-03-25 12:10:04 | Re: SQL/JSON: JSON_TABLE |