Re: BUG #16020: ICU Collations querys

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Marina Garrido Sanchez" <garridosanchezmarina(at)gmail(dot)com>
Cc: "Peter Eisentraut" <peter(dot)eisentraut(at)2ndquadrant(dot)com>,pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16020: ICU Collations querys
Date: 2019-09-26 16:39:40
Message-ID: a0cc6c11-c06f-4275-ac48-890e463d0c4c@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Marina Garrido Sanchez wrote:

> create collation ca_insensitive (provider = icu, locale =
> 'es-ES-u-ks-level1', deterministic = false;
>
> create table users (
> nombre text collate "ca_insensitive" primary key unique,
> apellidos text collate "ca_insensitive",
> direccion text
> );
>
> insert into user values ("jávier", "gonzález", "BBB")
> insert into user values ("Javier", "Gonzalez", "BBB")

Independently of the syntax problem (literals must be enclosed
in single quotes, not double quotes), the second insert should
fail since 'jávier' = 'Javier' with the above-defined collation and
there is a unique index on users.nombre.

> select * from users where users.apellidos ilike '%Gonz%';
>
> I get the error that *ilike *does not support nondeterministic operation,
> for that, my question is How can I do partial match search with case- and
> accet- insensitive in postgres? or Which operator can I use to do the
> search?

If you're interested only in the case insensitiveness, you
could force a deterministic collation to the ilike argument,
for instance:

select * from users where users.apellidos ilike '%Gonz%' collate "es-x-icu";

If you really need collate-sensitive substring search with non-deterministic
collations, I don't think there is any way in Postgres 12 to get that.
If you can install the icu_ext extension [1], its icu_strpos() function
implements that using the collation-aware string search feature in ICU.

As string like '%foo%' is equivalent to strpos(string, 'foo')>0
when string is associated to a deterministic collation,
string like '%foo%' with a non-derministic collation
is not supported but it is equivalent to icu_strpos(string, 'foo')>0.

For instance, with the "ca_insensitive" collation you defined,
both 'jáv' and ''Jav' match 'jav':

insert into users values ('jávier', 'gonzález', 'BBB');
insert into users values ('Javier2', 'Gonzalez', 'BBB');

select * from users where icu_strpos(nombre, 'jav')>0;
nombre | apellidos | direccion
---------+-----------+-----------
jávier | gonzález | BBB
Javier2 | Gonzalez | BBB

[1] https://github.com/dverite/icu_ext#icu_strpos

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2019-09-26 22:15:13 BUG #16028: JDBC currentSchema connection parameter not working for text search configuration
Previous Message Tom Lane 2019-09-26 16:31:43 Re: BUG #16027: Invalid output of to_timestamp