Re: BUG #15892: URGENT: Using an ICU collation in a primary key column breaks ILIKE query

From: James Inform <james(dot)inform(at)pharmapp(dot)de>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15892: URGENT: Using an ICU collation in a primary key column breaks ILIKE query
Date: 2019-07-04 16:40:11
Message-ID: 1820097204.279159.1562258411044@email.ionos.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The weird thing is, that this behavior only occures when used on a field field a primary key:

If you do:

create database testdb;

\c testdb;

-- Just create a simple table with one column
create table icutest(data text not null collate "de-x-icu" primary key, data2 text collate "de-x-icu");

-- Insert a record with uppercase string
insert into icutest values ('MYTEST','MYTEST');

-- This is not giving a match
select * from icutest where data ilike 'mytest';

-- BUT THIS GIVES A MATCH:

select * from icutest where data2 ilike 'mytest';

-- So it seems to be especially related to the scenario where a primary key / index exists.

> On July 4, 2019 at 12:36 PM Daniel Verite <daniel(at)manitou-mail(dot)org> wrote:
>
>
> PG Bug reporting form wrote:
> > -- Just create a simple table with one column
> > create table icutest(data text not null collate "de-x-icu" primary key);
> >
> > -- Insert a record with uppercase string
> > insert into icutest values ('MYTEST');
> >
> > -- This is not giving a match
> > select * from icutest where data ilike 'mytest';
>
> This also happens on v10 and on the master branch.
>
> The bug seems to come from a mistake in like_support.c:
>
>
> /* * Check whether char is a letter (and, hence, subject to case-folding)
> * * In multibyte character sets or with ICU, we can't use isalpha, and it does * not seem worth trying to convert to wchar_t to use iswalpha. Instead,
> just * assume any multibyte char is potentially case-varying.
> */
> static int
> pattern_char_isalpha(char c, bool is_multibyte,
> pg_locale_t locale, bool locale_is_c)
> {
> if (locale_is_c)
> return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
> else if (is_multibyte && IS_HIGHBIT_SET(c))
> return true;
> else if (locale && locale->provider == COLLPROVIDER_ICU)
> return IS_HIGHBIT_SET(c) ? true : false;
>
>
> With an ICU locale, this returns false for all characters in 'mytest'.
>
> I think this eventually leads the caller to incorrectly believe that it
> can optimize the test into an exact match (data='mytest'), given
> there are otherwise no wildcards in the pattern.
>
> On fixing the bug, if we make this function returns true for all
> characters under an ICU locale, it appears to work, but we're loosing an
> opportunity to optimize for some patterns.
> If OTOH we wanted to use an ICU call like u_isalpha(), to be closer
> to what's done with libc, we'd need to pass a UChar32 argument,
> not a char, and since we're in a char-oriented context, I don't see how
> to do that.
>
>
> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: http://www.manitou-mail.org
> Twitter: @DanielVerite

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Manuel Rigger 2019-07-04 16:40:38 Re: ALTER TABLE results in "ERROR: could not open relation with OID 43707388"
Previous Message David G. Johnston 2019-07-04 16:24:27 Re: ALTER TABLE results in "ERROR: could not open relation with OID 43707388"