| From: | Michael Paquier <michael(at)paquier(dot)xyz> |
|---|---|
| To: | pgsql-committers(at)lists(dot)postgresql(dot)org |
| Subject: | pgsql: Fix off-by-one with NFC recomposition for Hangul U+11A7 (TBASE) |
| Date: | 2026-06-04 22:50:39 |
| Message-ID: | E1wVGtb-0016LQ-1k@gemulon.postgresql.org |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-committers |
Fix off-by-one with NFC recomposition for Hangul U+11A7 (TBASE)
The NFC recomposition incorrectly included TBASE as a valid T syllable,
which is incorrect based on the Unicode specification (TBASE is one
below the start of the range, range beginning at U+11A8).
This would cause the TBASE to be silently swallowed in the
normalization, leading to an incorrect result.
A couple of regression tests are added to check more patterns with
Hangul recomposition and decomposition, on top of a test to check the
problem with TBASE. Diego has submitted the code fix, and I have
written the tests.
Author: Diego Frias <mail(at)dzfrias(dot)dev>
Co-authored-by: Michael Paquier <michael(at)paquier(dot)xyz>
Discussion: https://postgr.es/m/B92ED640-7D4A-4505-B09F-3548F58CBB16@dzfrias.dev
Backpatch-through: 14
Branch
------
REL_17_STABLE
Details
-------
https://git.postgresql.org/pg/commitdiff/0c9cbbfb5be79d2061d7f897f6c6f4bccb886062
Modified Files
--------------
src/common/unicode_norm.c | 2 +-
src/test/regress/expected/unicode.out | 78 +++++++++++++++++++++++++++++++++++
src/test/regress/sql/unicode.sql | 20 +++++++++
3 files changed, 99 insertions(+), 1 deletion(-)
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2026-06-04 22:54:41 | Re: pgsql: pg_dump: scope indAttNames per index in getIndexes() |
| Previous Message | Daniel Gustafsson | 2026-06-04 22:20:10 | pgsql: doc: Mention online checksum enabling in pg_checksums docs |