From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
---|---|
To: | John Naylor <johncnaylorls(at)gmail(dot)com> |
Cc: | Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net> |
Subject: | Re: GB18030-2022 Support in PostgreSQL |
Date: | 2025-09-10 11:54:08 |
Message-ID: | CAEoWx2=EoJYa8HRXaWOnoaD58MonkwN8pTKkc_7=oj6Zjs0P=w@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi John,
Thank you very much for taking care of this patch.
John Naylor <johncnaylorls(at)gmail(dot)com> 于2025年9月10日周三 14:38写道:
>
> - The URL at the top currently points to a directory in Github, but v3
> changed it to point to the actual file. A directory can be navigated
> for inspection, so I used:
>
> 2000:
> https://github.com/unicode-org/icu-data/tree/main/charset/data/ucm
>
> 2022:
> https://github.com/unicode-org/icu/blob/main/icu4c/source/data/mappings/
>
>
Looks good.
> - I also made the regex a multiline regex for readability, even though
> the previous one was not.
>
>
Thank you very much for polishing the perl script. I am not an expert of
perl. I can make the script working, but not perfect.
> For 2022 version, I think it would be good to once run a test to
> verify that no mappings changed that we didn't expect. Perhaps the
> tests here can be used:
>
>
> https://www.postgresql.org/message-id/b9e3167f-f84b-7aa4-5738-be578a4db924%40iki.fi
>
>
I have manually run tested I had done before, everything works as expected.
I downloaded the tests from the referenced mail, but I cannot make the
tests to run. After extracting the 2 patch files, it added
src/test/encodings, but "make check" seems to not run them. I tried to copy
.out and .sql files to src/test/regress, but the tests still not running.
Did I miss anything?
The upstream correction to the 2000 version is not present in our
> mappings, so we should mention that, unless it was reverted in or
> before 2022.
>
I think the upstream correction to the 2000 version is just a few not
round-trip chars that are ignored by us. So I feel we don't need to mention
them.
>
> In the documentation (charset.sgml), do we want to mention the version
> e.g. the following?
>
> <entry><literal>GB18030</literal></entry>
> -<entry>National Standard</entry>
> +<entry>National Standard, version 2022</entry>
>
That's a good idea. I updated the sgml file:
[image: image.png]
>
> I've whacked around the commit messages, so those should be reviewed
> for accuracy.
>
> Your draft commit message had "9 characters are no longer required by
> the new standard, but are retained in this patch for compatibility"
> ...but those nine were introduced in the 2005 version, right? In which
> case it doesn't affect us. Please confirm.
>
I don't find any hint about if the 9 characters were introduced in the 2005
version.
But without this patch, they can be properly converted:
```
evantest=# SELECT encode(convert_from(decode('FD9D', 'hex'),
'GB18030')::bytea, 'hex');
encode
--------
efa5b9
(1 row)
```
So they should be available in the version 2002 already.
>
> "Author: Zheng Tao <taoz(at)highgo(dot)com>" -- I haven't seen any messages
> from this address in this thread, so could you confirm this was
> intentional?
>
>
Yes, Zheng Tao is my colleague. He worked with me for this patch, so I want
to credit him.
I am attaching v5 version. The only change is 0003, I added the SGML change.
Best regards,
Chao Li (Evan)
---------------------
HighGo Software Co., Ltd.
https://www.highgo.com/
Attachment | Content-Type | Size |
---|---|---|
v5-0002-JCN-changes.patch | application/octet-stream | 2.4 KB |
v5-0001-Generate-GB18030-mappings-from-the-Unicode-Consor.patch | application/octet-stream | 4.9 KB |
v5-0003-Update-GB18030-encoding-from-version-2000-to-2022.patch | application/octet-stream | 456.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andrey Borodin | 2025-09-10 11:59:56 | Re: VM corruption on standby |
Previous Message | Alexander Kukushkin | 2025-09-10 11:53:41 | Re: issue with synchronized_standby_slots |