| From: | Andreas Karlsson <andreas(at)proxel(dot)se> |
|---|---|
| To: | Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Speed up ICU case conversion by using ucasemap_utf8To*() |
| Date: | 2025-12-31 00:18:40 |
| Message-ID: | c380cf11-dd23-4a75-af3e-3da8ec88bf6c@proxel.se |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 12/20/24 8:24 PM, Jeff Davis wrote:
> On Fri, 2024-12-20 at 06:20 +0100, Andreas Karlsson wrote:
>> SELECT count(upper) FROM (SELECT upper(('Kålhuvud ' || i) COLLATE
>> "sv-SE-x-icu") FROM generate_series(1, 1000000) i);
>>
>> master: ~540 ms
>> Patched: ~460 ms
>> glibc: ~410 ms
>
> It looks like you are opening and closing the UCaseMap object each
> time. Why not save it in pg_locale_t? That should speed it up even more
> and hopefully beat libc.
Fixed. New benchmarks are:
SELECT count(upper) FROM (SELECT upper(('Kålhuvud ' || i) COLLATE
"sv-SE-x-icu") FROM generate_series(1, 1000000) i);
master: ~570 ms
Patched: ~340 ms
glibc: ~400 ms
So it does indeed seem like we got a further speedup and now are faster
than glibc.
> Also, to support older ICU versions consistently, we need to fix up the
> locale name to support "und"; cf. pg_ucol_open(). Perhaps factor out
> that logic?
Fixed.
Andreas
| Attachment | Content-Type | Size |
|---|---|---|
| v2-0001-Use-optimized-versions-of-ICU-case-conversion-for.patch | text/x-patch | 13.6 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andreas Karlsson | 2025-12-31 01:01:05 | Re: Add support for EXTRA_REGRESS_OPTS for meson |
| Previous Message | Tom Lane | 2025-12-30 23:56:43 | Re: lsyscache: free IndexAmRoutine objects returned by GetIndexAmRoutineByAmId() |