Re: SQL Error [0A000]: ERROR: nondeterministic collations are not supported for LIKE

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Jeff Lanzarotta" <jeff(dot)lanzarotta(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: SQL Error [0A000]: ERROR: nondeterministic collations are not supported for LIKE
Date: 2019-10-24 19:39:32
Message-ID: ba826053-a7af-43d8-9fd3-ddfbcb28104e@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Jeff Lanzarotta wrote:

> I have a question about nondeterministic collations in PostgreSQL 12. I
> have created a new collation that is nondeterministic and created several
> columns which use this collation. Querying these columns works great until
> I use LIKE. When I do, I get the following error:
>
> SQL Error [0A000]: ERROR: nondeterministic collations are not supported for
> LIKE
>
> Is there any plan to allow this functionality?

PostgreSQL development is conducted without a roadmap [1]. Maybe
someone will submit a patch to enable LIKE with nondeterministic
collations, but so far it did not happen according to the current set
of patches at https://commitfest.postgresql.org

Such matches can be weirder than you might think (not to
mention much slower).
Consider for instance a collation that ignores punctuation:

CREATE COLLATION "nd3alt" (
provider = 'icu',
locale='und(at)colAlternate=shifted',
deterministic = false
);

In the icu_ext extension, icu_strpos [2] can match a substring with
a nondeterministic collation, which is one part of what LIKE
would need to do for such collations. The implementation uses
the string search facility of the ICU library.

With the above-defined collation, we can have for instance:

SELECT icu_strpos('abc. ...de', 'c,d' COLLATE nd3alt);
icu_strpos
------------
3

So even though 'c,d' is not a substring of 'abc. ...de' in the common
sense, it is recognized as such by this collation, by design.

A LIKE operator for nondeterministic collations should be able to
recognize this too, but with an arbitrary number of substrings to
match in the pattern, plus it should handle the underscore wildcard
in a way that hopefully makes sense.

With the example above,
'abc. ...de' LIKE '%c,d%' COLLATE nd3alt
should certainly be a match, but in the case of this variant:
'abc. ...de' LIKE '%c_d%' COLLATE nd3alt
it's not necessarily clear how (or even if) it should work.

[1] https://www.postgresql.org/developer/roadmap/
[2] https://github.com/dverite/icu_ext#icu_strpos

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jeff Lanzarotta 2019-10-24 20:09:02 Re: SQL Error [0A000]: ERROR: nondeterministic collations are not supported for LIKE
Previous Message Laurenz Albe 2019-10-24 19:15:42 Re: jsonb_set() strictness considered harmful to data