From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Alexander Borisov <lex(dot)borisov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Improve the performance of Unicode Normalization Forms. |
Date: | 2025-06-19 17:41:57 |
Message-ID: | 4211ffd7fe154c4af693b98d78f4a3689ce8cc30.camel@j-davis.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 2025-06-03 at 00:51 +0300, Alexander Borisov wrote:
> As promised, I continue to improve/speed up Unicode in Postgres.
> Last time, we improved the lower(), upper(), and casefold()
> functions. [1]
> Now it's time for Unicode Normalization Forms, specifically
> the normalize() function.
Did you compare against other implementations, such as ICU's
normalization functions? There's also a rust crate here:
https://github.com/unicode-rs/unicode-normalization
that might have been optimized.
In addition to the lookups themselves, there are other opportunities
for optimization as well, such as:
* reducing the need for palloc and extra buffers, perhaps by using
buffers on the stack for small strings
* operate more directly on UTF-8 data rather than decoding and re-
encoding the entire string
Regards,
Jeff Davis
From | Date | Subject | |
---|---|---|---|
Next Message | Timur Magomedov | 2025-06-19 18:09:27 | Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 |
Previous Message | David E. Wheeler | 2025-06-19 17:38:54 | Re: Add CASEFOLD() function. |