pgsql: Update GB18030 encoding from version 2000 to 2022

From: John Naylor <john(dot)naylor(at)postgresql(dot)org>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Update GB18030 encoding from version 2000 to 2022
Date: 2025-09-24 06:29:13
Message-ID: E1v1J04-002Gtf-2x@gemulon.postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Update GB18030 encoding from version 2000 to 2022

Mappings for 18 characters have changed, affecting 36 code points. This
is a break in compatibility, but these characters are rarely used.

U+E5E5 (Private Use Area) was previously mapped to \xA3A0. This code
point now maps to \x65356535. Attempting to convert \xA3A0 will now
raise an error.

Separate from the 2022 update, the following mappings were previously
swapped, and subsequently corrected in 2000 and later versions:
* U+E7C7 (Private Use Area) now maps to \x8135F437
* U+1E3F (Latin Small Letter M with Acute) now maps to \xA8BC

The 2022 standard mentions the following policy changes, but they
have no effect in our implementation:

66 new ideographs are now required, but these are mapped
algorithmically so were already handled by utf8_and_gb18030.c.

Nine CJK compatibility ideographs are no longer required, but
implementations may retain them, as does the source we use from
the Unicode Consortium.

Release notes: Compatibility section

For further details, see:
https://www.unicode.org/L2/L2022/22274-disruptive-changes.pdf
https://ken-lunde.medium.com/the-gb-18030-2022-standard-3d0ebaeb4132

Author: Chao Li <lic(at)highgo(dot)com>
Author: Zheng Tao <taoz(at)highgo(dot)com>
Discussion: https://postgr.es/m/966d9fc.169.198741fe60b.Coremail.jiaoshuntian%40highgo.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/5334620eef8f7b429594e6cf9dc97331eda2a8bd

Modified Files
--------------
doc/src/sgml/charset.sgml | 2 +-
src/backend/utils/mb/Unicode/Makefile | 6 +-
src/backend/utils/mb/Unicode/UCS_to_GB18030.pl | 6 +-
src/backend/utils/mb/Unicode/gb18030_to_utf8.map | 1877 +++++++++++-----------
src/backend/utils/mb/Unicode/utf8_to_gb18030.map | 1340 +++++++--------
src/test/regress/expected/conversion.out | 7 +-
src/test/regress/sql/conversion.sql | 1 +
7 files changed, 1659 insertions(+), 1580 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Daniel Gustafsson 2025-09-24 13:08:38 pgsql: Fix incorrect option name in usage screen
Previous Message Amit Kapila 2025-09-24 04:25:48 pgsql: Fix LOCK_TIMEOUT handling during parallel apply.