| From: | Alexander Borisov <lex(dot)borisov(at)gmail(dot)com> |
|---|---|
| To: | Michael Paquier <michael(at)paquier(dot)xyz> |
| Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Improve the performance of Unicode Normalization Forms. |
| Date: | 2026-06-28 17:55:51 |
| Message-ID: | e165c686-a473-4775-9ce2-8a377823e9d6@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Rebased.
v12:
* Fixed bug in unicode_normalize() logic around blocked recomposition:
Correctly updates the starter when a new ccc == 0 character appears.
Avoids resetting prev_ccc incorrectly after successful recomposition.
Added an NFC regression test for: x + acute + a + acute -> x + acute + á
* Adds overflow checks for uint8/uint16 table indexes and sizes.
* Adds decomposition_sort_length() helper.
* Makes generated tables match what pgindent expects.
--
Best regards,
Alexander Borisov
| Attachment | Content-Type | Size |
|---|---|---|
| v12-0001-Add-TwoStageTable.pm-a-Perl-helper-for-two-stage.patch | text/plain | 14.1 KB |
| v12-0002-Improve-the-performance-of-Unicode-Normalization.patch | text/plain | 819.4 KB |
| v12-0003-Refactor-Unicode-Normalization-Forms-for-perform.patch | text/plain | 353.9 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Haibo Yan | 2026-06-28 17:58:23 | Re: [PATCH] DISTINCT in plain aggregate window functions |
| Previous Message | Feng Wu | 2026-06-28 16:53:04 | Re: [PATCH] Avoid collation lookup for "char" statistics |