From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
---|---|
To: | John Naylor <johncnaylorls(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: Patch 1 of GB18030-2022 support |
Date: | 2025-08-12 02:09:42 |
Message-ID: | 8065E862-A017-4B04-A956-F1FE3415B5B9@gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi John,
>
> So this patch should have no any impact to build-out binaries of PostgreSQL.
>
> Patch 2 will update gb-18030-2000.ucm with the latest version, and patch 3 will upgrade ucm to the 2022 version.
>
I am preparing for the patch 2. My original plan was to upgrade to the latest version of gb-18030-2000.ucm. But during my research, I think I may want to change the plan.
Looking at https://github.com/unicode-org/icu-data/commit/91850aec0209fee0f91248731c9dd4b2b768d2b5. There was a mapping error fix made to gb-18030-2000.ucm on Arp 23 2011 with commit the linked commit. The fix changed the mapping of GB code 0xA8BC from UE7C7 to U1E3F. But I noticed that, the same change was also made in gb-18030-2005.ucm.
Then I asked Gemini and ChatGPT, both of them told that the mapping change was introduced in gb18030-2005. Looks like the should only be made in gb-18030-2005.ucm.
So, I compared 2000 ucm with 2005 ucm also compared 2005 ucm with 2022 ucm. Then I found that some changed in 2005 is reverted in 2022, that why diff between 2000 and 2022 is small. For example, the following mappings
```
<U20087> \xFE\x51 |0
<U20089> \xFE\x52 |0
<U200CC> \xFE\x53 |0
<U215D7> \xFE\x6C |0
<U2298F> \xFE\x76 |0
<U241FE> \xFE\x91 |0
```
Was added to 2005 ucm and removed from 2022 ucm.
The other example is the 2000 mapping "<UE816> \xFE\x51 |0” is changed to "<UE816> \xFE\x51 |1” in 2005, and is changed back to "<UE816> \xFE\x51 |0” in 2022.
So, for how to create patch 2, I think we have 3 options:
1. As planned, update to the latest version of 2000 ucm, then skip 2005 and directly upgrade to 2022 in patch 3. This way, we just honor 2000 ucm regardless that the change is actually introduced by 2005.
2. Skip the latest version of 2000 ucm and upgrade to 2005 ucm. This way will clearly show the upgrade path 2000->2005->2022. Downside is that 2005 introduced some changes that are reverted in 2022, which will cause some unnecessary changes in map files.
3. Skip patch 2, directly go to patch 3. So that, patch 3 will include changes introduced by both 2005 and 2022. This way makes minimum changes to map files.
I would prefer option 2 or 3, and slightly more keen on 3. What do you think?
Best regards,
Chao Li (Evan)
--------------------
HighGo Software Co., Ltd.
https://www.highgo.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2025-08-12 02:34:32 | Re: Annoying warning in SerializeClientConnectionInfo |
Previous Message | Richard Guo | 2025-08-12 01:43:06 | Re: Pathify RHS unique-ification for semijoin planning |