Quick Links

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

From:	Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc:	Peter Geoghegan <pg(at)bowt(dot)ie>, Роман Литовченко <roman(dot)lytovchenko(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows
Date:	2020-06-09 22:29:33
Message-ID:	20200610002933.6a6d482b@firost
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

Hello,

I didn't find any other discussion related to this bug, neither on pgsql-bugs
or pgsql-hackers. Hopefully, this is the best thread to give some update.

On Sat, 21 Jul 2018 13:39:12 +1200
Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> wrote:

> On Fri, Jul 20, 2018 at 11:26 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > On Thu, Jul 19, 2018 at 9:44 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> >> It appears that the main support function 1 routine disagrees with the
> >> CREATE INDEX sort order, which is wrong. I'll try to isolate the
> >> problem a bit further.
> >
> > As far as I can tell, this is an ICU bug. ucol_strcollUTF8() is buggy
> > with this digitslast collation, which ucol_nextSortKeyPart() fails to
> > be bug-compatible with. Other similar customized collations (e.g.
> > 'en-u-kf-upper') work fine. (Ugh, that's familiar in an unpleasant
> > way.)
> >
> > I'm using libicu60. What version are you using, Roman?
> >
> > I tried to find something that matches this on the ICU bug tracker.
> > This might be a match: https://ssl.icu-project.org/trac/ticket/12518
>
> FWIW I see the same result with icu 61.1 and 62.1_1 from FreeBSD ports.

Some colleagues hit this bug as well last week and reported it to me. I can
reproduce this bug with ICU current master branch, version post 67.1.

I wrote a regression test for icu4c and posted it on ICU-12518. See:
https://unicode-org.atlassian.net/browse/ICU-12518

As Peter wrote, ucol_strcollUTF8 (and ucol_strcoll) functions are affected. A
quick and dirty patch to replace ucol_strcoll* by ucol_getSortKey/strcmp
everywhere fixed the bug for my tests.

After playing with ICU regression tests, I found functions ucol_strcollIter
and ucol_nextSortKeyPart are safe. I'll do some performance tests and report
here.

In the meantime, I've been working on various workarounds. The only one I found
is to use "fr-u-kr-latn-digit-kn" instead of "fr-u-kr-latn-digit".
Unfortunately, the two collations are not equivalent, but I believe it might be
useful in many case.

I've been working on a second workaround: creating a type (a char variant for
our usecase), its operators and opfamily. All operators and function 1 relies
on ucol_getSortKey. Most of the workaround works good but surprisingly, the
sort order is only enforced if the field is in the first position:

* this works: "SORT BY f1 COLLATE digitslast"
* this fails: "SORT BY f2, f1 COLLATE digitslast"

I hadn't time to investigate further on this last topic.

Regards,

In response to

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows at 2018-07-21 01:39:12 from Thomas Munro

Responses

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows at 2020-06-12 16:40:55 from Jehan-Guillaume de Rorthais
Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows at 2020-09-02 12:06:18 from Peter Eisentraut

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Fan Liu	2020-06-10 01:28:57	RE: [Bus error] huge_pages default value (try) not fall back
Previous Message	Mukesh Chhatani	2020-06-09 19:21:45	Re: Postmaster Crashing - Postgres 11 when JIT is enabled