| From: | Alexander Borisov <lex(dot)borisov(at)gmail(dot)com> |
|---|---|
| To: | Michael Paquier <michael(at)paquier(dot)xyz> |
| Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Improve the performance of Unicode Normalization Forms. |
| Date: | 2026-03-26 15:13:36 |
| Message-ID: | fe3bb776-bf6d-40f9-b83a-f64b0948cf6f@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Michael,
25.03.2026 09:12, Michael Paquier wrote:
> On Wed, Feb 25, 2026 at 11:21:43AM +0300, Alexander Borisov wrote:
>> Gentle ping — did you have a chance to look at this patch series?
>> If anything needs to be changed/added/removed, I’m happy to update it.
>
> I find this patch series pretty cool. I cannot take it for this
> release, unfortunately, but I'd be happy to study and potentially do
> something about this patch set when v20 opens for business.
Thank you for your time! Let's aim for v20.
I would be very glad if we can genuinely improve Unicode in PostgreSQL.
> Rather than share the files you have used for your benchmarks,
> couldn't you share a script that can generate them (dirty is fine)? I
> am guessing that this could just be a perl script that goes through
> the unicode and normalization data and builds the testing patterns you
> are looking for to prove your point. This would be much better than
> uploading things somewhere: if we don't have a reproducing set of data
> on pgsql-hackers, then we'd lose a part of the test history. That
> would not be cool. Any committer who could look at your patch will
> need these files to double-check your claims, and anything that
> reduces the review burden can speed up the evaluation process.
>
> Jeff has posted some scenarios upthread, but I am also wondering what
> you have exactly done in terms of benchmark, and reviewing benchmarks
> is part of the patch review process.
Attached rebased patches and Perl script (generate_NF_C_D_KC_KD_sql.pl)
for generating test files. At the beginning of the script, there is
a description of how to use it and what it generates.
--
Alexander Borisov
| Attachment | Content-Type | Size |
|---|---|---|
| v10-0001-Add-Perl-module-PrettyLine-to-a-common-module.patch | text/plain | 4.6 KB |
| v10-0002-Add-Perl-module-Sparse-Array-to-a-common-module.patch | text/plain | 11.9 KB |
| v10-0003-Improve-the-performance-of-Unicode-Normalization.patch | text/plain | 819.4 KB |
| v10-0004-Refactoring-Unicode-Normalization-Forms-performa.patch | text/plain | 320.3 KB |
| generate_NF_C_D_KC_KD_sql.pl | text/x-perl-script | 3.1 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Heikki Linnakangas | 2026-03-26 15:21:58 | Re: Clean up NamedLWLockTranche stuff |
| Previous Message | Yugo Nagata | 2026-03-26 15:09:31 | Re: Allow to collect statistics on virtual generated columns |