From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, John Naylor <johncnaylorls(at)gmail(dot)com> |
Subject: | Re: GB18030-2022 Support in PostgreSQL |
Date: | 2025-08-11 02:01:08 |
Message-ID: | 3f12e2ab-6a20-4363-b72f-42502d1c36d3@gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I have created a patch https://commitfest.postgresql.org/patch/5954/.
CommitFests requested a rebase, so I rebased the code and created the v2
patch.
BTW, I have tested all 66 new characters, 9 not-required characters and
18 changed characters in a way as:
evantest=# SELECT encode(convert_from(decode('82359632', 'hex'),
'GB18030')::bytea, 'hex');
encode
--------
e9bfab
(1 row)
All encoded correctly.
Chao Li (Evan)
---------------------
HighGo Software Co., Ltd.
https://www.highgo.com/
On 2025/8/7 16:14, Chao Li wrote:
> I did more researches about the changes in 2022 over 2000, here is a
> summary:
>
> * 66 new characters have been added in 2022. All these are 4 bytes
> characters. As the map files store only 2 bytes GB code mappings, 4
> bytes GB code mapping are calculated, thus these chars can be properly
> encoded/decoded without this patch, I tested that.
> * 9 characters are no longer required by 2022, but application may
> decide to retain them or not. As the ucm file
> (https://github.com/unicode-org/icu/blob/main/icu4c/source/data/mappings/gb18030-2022.ucm)
> retains them, we also retain them.
> * Unicode mappings for 18 characters have changed. Only these changes
> will cause backward compatibility issues. However, half of them are
> rarely used punctuation marks and rests are glyphs that I cannot
> recognize as a native Chinese speaker. So these changes should not
> significantly impact most existing databases.
>
> I added a test case with a mapping changed char, and the test passes:
>
> % make check
> ...
> # All 229 tests passed.
>
> For more details on the standard change, see
> https://ken-lunde.medium.com/the-gb-18030-2022-standard-3d0ebaeb4132
>
> I am attaching the patch file.
>
> Chao Li (Evan)
> ---------------------
> Highgo Software Co., Ltd.
> https://www.highgo.com/
>
>
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Upgrade-GB18030-encoding-support-from-2000-to-202.patch | text/plain | 2.0 MB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2025-08-11 04:45:41 | Parallel Apply |
Previous Message | Mircea Cadariu | 2025-08-11 00:10:21 | Re: Request for Guidance on Reducing PostgreSQL DB Restoration Time |