From: | John Naylor <johncnaylorls(at)gmail(dot)com> |
---|---|
To: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net> |
Subject: | Re: GB18030-2022 Support in PostgreSQL |
Date: | 2025-08-11 05:50:48 |
Message-ID: | CANWCAZaHbby890qkVQkjwW991fmYzJKXmfKEVhQtOYw+uh8Vhw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Aug 11, 2025 at 9:01 AM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>
> I have created a patch https://commitfest.postgresql.org/patch/5954/. CommitFests requested a rebase, so I rebased the code and created the v2 patch.
>
> BTW, I have tested all 66 new characters, 9 not-required characters and 18 changed characters in a way as:
"9 characters are no longer required by the new standard, but are
retained in this patch for compatibility"
How is that done?
> I added a test case with a mapping changed char, and the test passes:
>
> % make check
> ...
> # All 229 tests passed.
>
> For more details on the standard change, see https://ken-lunde.medium.com/the-gb-18030-2022-standard-3d0ebaeb4132
>
> I am attaching the patch file.
Going from the old .xml file to the .ucm file makes it difficult to
see the relevant changes. Also, there are nearly 1000 non-user-visible
changes like this in the output file that are not explained:
- /*** Three byte table, leaf: efa8xx - offset 0x07aba ***/
+ /*** Three byte table, leaf: efa8xx - offset 0x07b3a ***/
The 2000 version is available in the .ucm format, so maybe converting
to that first would be a good preparatory patch:
https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/gb-18030-2000.ucm
Looking at the history, it looks like that file has seen small
revisions, so it may take some research to get the exact equivalent to
the XML file we use. That will also tell us if anything will change
for us besides the actual 2022 revision.
--
John Naylor
Amazon Web Services
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2025-08-11 06:04:15 | Generate GUC tables from .dat file |
Previous Message | Amit Kapila | 2025-08-11 04:45:41 | Parallel Apply |