BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows
Date: 2020-07-15 13:52:20
Message-ID: 20200715155220.4bb89f56@firost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I'm bumping this thread on pgsql-hacker, hopefully it will drag some more

Should we try to fix this issue or not? This is clearly an upstream bug. It has
been reported, including regression tests, but this doesn't move since 2 years

If we choose not to fix it on our side using eg a workaround (see patch), I
suppose this small bug should be documented somewhere so people are not lost
alone in the wild.



Begin forwarded message:

Date: Sat, 13 Jun 2020 00:43:22 +0200
From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Роман Литовченко <roman(dot)lytovchenko(at)gmail(dot)com>, PostgreSQL mailing lists
<pgsql-bugs(at)lists(dot)postgresql(dot)org> Subject: Re: BUG #15285: Query used index
over field with ICU collation in some cases wrongly return 0 rows

On Fri, 12 Jun 2020 18:40:55 +0200
Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> wrote:

> On Wed, 10 Jun 2020 00:29:33 +0200
> Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> wrote:
> [...]
> > After playing with ICU regression tests, I found functions ucol_strcollIter
> > and ucol_nextSortKeyPart are safe. I'll do some performance tests and report
> > here.
> I did some benchmarks. See attachment for the script and its header to
> reproduce.
> It sorts 935895 french phrases from 0 to 122 chars with an average of 49.
> Performance tests were done on current master HEAD (buggy) and using the patch
> in attachment, relying on ucol_strcollIter.
> My preliminary test with ucol_getSortKey was catastrophic, as we might
> expect. 15-17x slower than the current HEAD. So I removed it from actual
> tests. I didn't try with ucol_nextSortKeyPart though.
> Using ucol_strcollIter performs ~20% slower than HEAD on UTF8 databases, but
> this might be acceptable. Here are the numbers:
> DB Encoding HEAD strcollIter ratio
> UTF8 2.74 3.27 1.19x
> LATIN1 5.34 5.40 1.01x
> I plan to add a regression test soon.

Please, find in attachment the second version of the patch, with a
regression test.


Jehan-Guillaume de Rorthais

Attachment Content-Type Size
v2-0001-Replace-buggy-ucol_strcoll-funcs-with-ucol_strcollIt.patch text/x-patch 6.1 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2020-07-15 14:28:07 Re: Postgres is not able to handle more than 4k tables!?
Previous Message Peter Eisentraut 2020-07-15 13:47:25 Re: Improve handling of parameter differences in physical replication