Re: SQL Error [0A000]: ERROR: nondeterministic collations are not supported for LIKE

From: Jeff Lanzarotta <jeff(dot)lanzarotta(at)gmail(dot)com>
To: Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: SQL Error [0A000]: ERROR: nondeterministic collations are not supported for LIKE
Date: 2019-10-24 20:09:02
Message-ID: CACykOPT-KsLctDir-37v061XH=LFXdo99ZLH0pQgjCeYy1QNKg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Daniel, thanks for the reply. I believe we will just remove the collation,
allow LIKE to function normally, and wait for a future patch is one is ever
provided.

On Thu, Oct 24, 2019 at 3:39 PM Daniel Verite <daniel(at)manitou-mail(dot)org>
wrote:

> Jeff Lanzarotta wrote:
>
> > I have a question about nondeterministic collations in PostgreSQL 12. I
> > have created a new collation that is nondeterministic and created several
> > columns which use this collation. Querying these columns works great
> until
> > I use LIKE. When I do, I get the following error:
> >
> > SQL Error [0A000]: ERROR: nondeterministic collations are not supported
> for
> > LIKE
> >
> > Is there any plan to allow this functionality?
>
> PostgreSQL development is conducted without a roadmap [1]. Maybe
> someone will submit a patch to enable LIKE with nondeterministic
> collations, but so far it did not happen according to the current set
> of patches at https://commitfest.postgresql.org
>
> Such matches can be weirder than you might think (not to
> mention much slower).
> Consider for instance a collation that ignores punctuation:
>
> CREATE COLLATION "nd3alt" (
> provider = 'icu',
> locale='und(at)colAlternate=shifted',
> deterministic = false
> );
>
> In the icu_ext extension, icu_strpos [2] can match a substring with
> a nondeterministic collation, which is one part of what LIKE
> would need to do for such collations. The implementation uses
> the string search facility of the ICU library.
>
> With the above-defined collation, we can have for instance:
>
> SELECT icu_strpos('abc. ...de', 'c,d' COLLATE nd3alt);
> icu_strpos
> ------------
> 3
>
> So even though 'c,d' is not a substring of 'abc. ...de' in the common
> sense, it is recognized as such by this collation, by design.
>
> A LIKE operator for nondeterministic collations should be able to
> recognize this too, but with an arbitrary number of substrings to
> match in the pattern, plus it should handle the underscore wildcard
> in a way that hopefully makes sense.
>
> With the example above,
> 'abc. ...de' LIKE '%c,d%' COLLATE nd3alt
> should certainly be a match, but in the case of this variant:
> 'abc. ...de' LIKE '%c_d%' COLLATE nd3alt
> it's not necessarily clear how (or even if) it should work.
>
>
> [1] https://www.postgresql.org/developer/roadmap/
> [2] https://github.com/dverite/icu_ext#icu_strpos
>
>
> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: http://www.manitou-mail.org
> Twitter: @DanielVerite
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2019-10-24 20:17:58 Re: jsonb_set() strictness considered harmful to data
Previous Message Daniel Verite 2019-10-24 19:39:32 Re: SQL Error [0A000]: ERROR: nondeterministic collations are not supported for LIKE