Re: Improve the performance of Unicode Normalization Forms.

From: Alexander Borisov <lex(dot)borisov(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improve the performance of Unicode Normalization Forms.
Date: 2026-06-28 17:55:51
Message-ID: e165c686-a473-4775-9ce2-8a377823e9d6@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Rebased.

v12:
* Fixed bug in unicode_normalize() logic around blocked recomposition:
Correctly updates the starter when a new ccc == 0 character appears.
Avoids resetting prev_ccc incorrectly after successful recomposition.
Added an NFC regression test for: x + acute + a + acute -> x + acute + á
* Adds overflow checks for uint8/uint16 table indexes and sizes.
* Adds decomposition_sort_length() helper.
* Makes generated tables match what pgindent expects.

--
Best regards,
Alexander Borisov

Attachment Content-Type Size
v12-0001-Add-TwoStageTable.pm-a-Perl-helper-for-two-stage.patch text/plain 14.1 KB
v12-0002-Improve-the-performance-of-Unicode-Normalization.patch text/plain 819.4 KB
v12-0003-Refactor-Unicode-Normalization-Forms-for-perform.patch text/plain 353.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Haibo Yan 2026-06-28 17:58:23 Re: [PATCH] DISTINCT in plain aggregate window functions
Previous Message Feng Wu 2026-06-28 16:53:04 Re: [PATCH] Avoid collation lookup for "char" statistics