Re: BUG #15892: URGENT: Using an ICU collation in a primary key column breaks ILIKE query

From: James Inform <james(dot)inform(at)pharmapp(dot)de>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15892: URGENT: Using an ICU collation in a primary key column breaks ILIKE query
Date: 2019-07-04 13:41:45
Message-ID: 1797229370.392447.1562247705371@email.ionos.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

<!DOCTYPE html>
<html><head>
<meta charset="UTF-8">
</head><body><p>The weird thing is, that this behavior only occures when used on a field field a primary key:</p><p>If you do:&#160;</p><p>create database testdb;</p><p>\c testdb;</p><p>-- Just create a simple table with one column<br>create table icutest(data text not null collate &#34;de-x-icu&#34; primary key, data2 text collate &#34;de-x-icu&#34;);</p><p>-- Insert a record with uppercase string<br>insert into icutest values (&#39;MYTEST&#39;,&#39;MYTEST&#39;);</p><p>-- This is not giving a match<br>select * from icutest where data ilike &#39;mytest&#39;;</p><p>-- BUT THIS GIVES A MATCH:</p><p>select * from icutest where data2 ilike &#39;mytest&#39;;</p><p>-- So it seems to be especially related to the scenario where a primary key / index exists.</p><blockquote type="cite">On July 4, 2019 at 12:36 PM Daniel Verite &#60;<a href="mailto:daniel(at)manitou-mail(dot)org">daniel(at)manitou-mail(dot)org</a>&#62; wrote:<br><br><br> PG Bug reporting form wrote:<blockquote type="cite">-- Just create a simple table with one column <br>create table icutest(data text not null collate &#34;de-x-icu&#34; primary key); <br><br>-- Insert a record with uppercase string <br>insert into icutest values (&#39;MYTEST&#39;); <br><br>-- This is not giving a match <br>select * from icutest where data ilike &#39;mytest&#39;;</blockquote><br>This also happens on v10 and on the master branch.<br><br>The bug seems to come from a mistake in like_support.c:<br><br> <br>/* * Check whether char is a letter (and, hence, subject to case-folding) <br> * * In multibyte character sets or with ICU, we can&#39;t use isalpha, and it does * not seem worth trying to convert to wchar_t to use iswalpha. Instead,<br>just * assume any multibyte char is potentially case-varying. <br> */<br>static int<br>pattern_char_isalpha(char c, bool is_multibyte,<br> pg_locale_t locale, bool locale_is_c)<br>{<br> if (locale_is_c)<br> return (c &#62;= &#39;A&#39; &#38;&#38; c &#60;= &#39;Z&#39;) || (c &#62;= &#39;a&#39; &#38;&#38; c &#60;= &#39;z&#39;);<br> else if (is_multibyte &#38;&#38; IS_HIGHBIT_SET(c))<br> return true;<br> else if (locale &#38;&#38; locale-&#62;provider == COLLPROVIDER_ICU)<br> return IS_HIGHBIT_SET(c) ? true : false;<br><br><br>With an ICU locale, this returns false for all characters in &#39;mytest&#39;.<br><br>I think this eventually leads the caller to incorrectly believe that it<br>can optimize the test into an exact match (data=&#39;mytest&#39;), given<br>there are otherwise no wildcards in the pattern.<br><br>On fixing the bug, if we make this function returns true for all<br>characters under an ICU locale, it appears to work, but we&#39;re loosing an<br>opportunity to optimize for some patterns.<br>If OTOH we wanted to use an ICU call like u_isalpha(), to be closer<br>to what&#39;s done with libc, we&#39;d need to pass a UChar32 argument,<br>not a char, and since we&#39;re in a char-oriented context, I don&#39;t see how<br>to do that.<br><br><br>Best regards,<br>-- <br>Daniel V&#233;rit&#233;<br>PostgreSQL-powered mailer: <a href="http://www.manitou-mail.org" rel="noopener" target="_blank">http://www.manitou-mail.org</a><br>Twitter: @DanielVerite</blockquote></body></html>

Attachment Content-Type Size
unknown_filename text/html 3.2 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2019-07-04 14:00:01 BUG #15894: postgresql 9.5 will not install
Previous Message Manuel Rigger 2019-07-04 13:11:49 Re: VACUUM FULL results in deadlock