From: | Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Роман Литовченко <roman(dot)lytovchenko(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows |
Date: | 2020-06-12 16:40:55 |
Message-ID: | 20200612184055.205f0159@firost |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Wed, 10 Jun 2020 00:29:33 +0200
Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> wrote:
[...]
> After playing with ICU regression tests, I found functions ucol_strcollIter
> and ucol_nextSortKeyPart are safe. I'll do some performance tests and report
> here.
I did some benchmarks. See attachment for the script and its header to
reproduce.
It sorts 935895 french phrases from 0 to 122 chars with an average of 49.
Performance tests were done on current master HEAD (buggy) and using the patch
in attachment, relying on ucol_strcollIter.
My preliminary test with ucol_getSortKey was catastrophic, as we might
expect. 15-17x slower than the current HEAD. So I removed it from actual tests.
I didn't try with ucol_nextSortKeyPart though.
Using ucol_strcollIter performs ~20% slower than HEAD on UTF8 databases, but
this might be acceptable. Here are the numbers:
DB Encoding HEAD strcollIter ratio
UTF8 2.74 3.27 1.19x
LATIN1 5.34 5.40 1.01x
I plan to add a regression test soon.
> In the meantime, I've been working on various workarounds. The only one I
> found is to use "fr-u-kr-latn-digit-kn" instead of "fr-u-kr-latn-digit".
> Unfortunately, the two collations are not equivalent, but I believe it might
> be useful in many case.
>
> I've been working on a second workaround: creating a type (a char variant for
> our usecase), its operators and opfamily. All operators and function 1 relies
> on ucol_getSortKey. Most of the workaround works good but surprisingly, the
> sort order is only enforced if the field is in the first position:
>
> * this works: "SORT BY f1 COLLATE digitslast"
> * this fails: "SORT BY f2, f1 COLLATE digitslast"
I fixed this. I didn't declare my opclass as default for the type I created.
I'm not sure people would like to see/discuss this user workaround here?
Regards,
Attachment | Content-Type | Size |
---|---|---|
test-icu.bash | application/octet-stream | 1.3 KB |
v1-0001-Replace-buggy-ucol_strcoll-funcs-with-ucol_strcollIt.patch | text/x-patch | 4.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2020-06-12 17:07:09 | BUG #16491: PostgreSQL will not install unless a local account is used |
Previous Message | baki baki | 2020-06-12 14:22:11 | Re: BUG #16488: psql installation initdb |