pgsql: Generate GB18030 mappings from the Unicode Consortium's UCM file

From: John Naylor <john(dot)naylor(at)postgresql(dot)org>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Generate GB18030 mappings from the Unicode Consortium's UCM file
Date: 2025-09-16 09:30:39
Message-ID: E1uyS1G-000ywA-20@gemulon.postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Generate GB18030 mappings from the Unicode Consortium's UCM file

Previously we built the .map files for GB18030 (version 2000) from an
XML file. The 2022 version for this encoding is only available as a
Unicode Character Mapping (UCM) file, so as preparatory refactoring
switch to this format as the source for building version 2000.

As we do with most input files for the conversion mappings, download
the file on demand. In order to generate the same mappings we have
now, we must download from a previous upstream commit, rather than
the head since the latter contains a correction not present in our
current .map files.

The XML file is still used by EUC_CN, so we cannot delete it from our
repository. GB18030 is a superset of EUC_CN, so it may be possible to
build EUC_CN from the same UCM file, but that is left for future work.

Author: Chao Li <lic(at)highgo(dot)com>
Discussion: https://postgr.es/m/966d9fc.169.198741fe60b.Coremail.jiaoshuntian%40highgo.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/cfa6cd29271e67c43c1040e3420c1145fdcdceb7

Modified Files
--------------
src/backend/utils/mb/Unicode/Makefile | 5 +++-
src/backend/utils/mb/Unicode/UCS_to_GB18030.pl | 28 +++++++++++++++-------
.../utf8_and_gb18030/utf8_and_gb18030.c | 7 +++++-
3 files changed, 29 insertions(+), 11 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Richard Guo 2025-09-16 09:42:52 pgsql: Treat JsonConstructorExpr as non-strict
Previous Message Peter Eisentraut 2025-09-16 09:02:18 pgsql: Move pg_int64 back to postgres_ext.h