Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, "Daniel Verite" <daniel(at)manitou-mail(dot)org>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Роман Литовченко <roman(dot)lytovchenko(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows
Date: 2020-09-02 12:55:50
Message-ID: 20200902145550.6a5014fb@firost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, 2 Sep 2020 14:06:18 +0200
Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:

> On 2020-06-10 00:29, Jehan-Guillaume de Rorthais wrote:
> > In the meantime, I've been working on various workarounds. The only one I
> > found is to use "fr-u-kr-latn-digit-kn" instead of "fr-u-kr-latn-digit".
> > Unfortunately, the two collations are not equivalent, but I believe it
> > might be useful in many case.
>
> What precisely is broken in the ICU library?

Using ucol_strcoll/ucol_strcollUTF8 with a custom collation sorting digits after
latn.

> All the examples so far refer to kr-latn-digit. Are all reorderings broken,
> or something specifically related to latn and/or digit?

I don't know. So far, I only found a couple of reports (mine included) using
kr-latn-digit in different languages. And as I wrote, kr-latn-digit-kn doesn't
seem affected. So all reorderings might not be broken.

But I have no strong facts about this, just tests.

> Are any collation customizations other than reorderings affected?

I didn't poke around to try some other random customizations. The answer lies
somewhere in the ICU codebase. I suppose we'll be able to answer this question
as soon as the bug will be explained.

However, the bug reported here are all about sorting: wrong result order and/or
wrong result because of badly sorted index.

Maybe Daniel has some more experience feedback with other customizations as he
seems to work extensively with ICU and PostgreSQL?

Regards,

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Bruce Momjian 2020-09-02 13:18:58 Re: BUG #16486: Prompted password is ignored when password specified in connection string
Previous Message Peter Eisentraut 2020-09-02 12:06:18 Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows