From: | John Naylor <johncnaylorls(at)gmail(dot)com> |
---|---|
To: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
Cc: | Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net> |
Subject: | Re: GB18030-2022 Support in PostgreSQL |
Date: | 2025-08-18 05:18:25 |
Message-ID: | CANWCAZbBEUuby3pejOq6L0b3OhEa3B9XQz=EziYAYkNpOODsig@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Aug 13, 2025 at 3:08 PM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
> Attached is the new patch. It downloads the UCM file in make:
> After regenerating the map files, there is no change found in the map files.
I can confirm, thanks.
We split a patch into multiple patches, it's customary include all of
them, since that process may result in unwelcome artifacts to sort
out. (When the first step has architectural questions or change in
behavior, we may treat it as independent, possibly with a separate
thread, but that's not the case here.) I do have some comments
already, though:
-my $in_file = "gb-18030-2000.xml";
-
+my $in_file = "gb-18030-2000.ucm";
-while (<$in>)
-{
+while (<$in>) {
-# The lines we care about in the source file look like
+# The lines we care about in the source file look like:
These are spurious changes, which we try to avoid.
- next if (!m/<a u="([0-9A-F]+)" b="([0-9A-F ]+)"/);
+ if (/^<U([0-9A-Fa-f]+)>\s+((?:\\x[0-9A-Fa-f]{2})+)\s*\|(\d+)/) {
This change in style caused extra whitespace-only churn. That obscures
what the actual changes are.
+ # Match lines like: <UXXXX> \xYY[\xYY...] |n, and use only (|0) mappings
This is missing an explanation of why we skip non-zero mappings.
Code-wise, this only matters for the output in the follow-on patch for
2022, but one of these patches needs to include a brief explanation. I
did not like the detailed description that was present in one of the
earlier 2022 patches that told how many characters were flagged a
certain way -- that's irrelevant detail and will likely get out of
date in some future version anyway.
+# and n is a flag indicating the type of mapping having
+# a single value of 0.
This seems weird when combined with the logic to filter out non-zero
mappings. We need to think about when and where to show relevant
information.
+ next if ($flag ne '0'); # non-0 flags
This comment is just repeating what the code is doing, and it's very
obvious what it's doing.
BTW, it sounds like your proposed Makefile changes are needed for the
follow-on patch with .map changes to work at all, is that right?
https://www.postgresql.org/message-id/1CA8625F-AA41-4ED2-B60F-E28AC71F37DC@highgo.com
--
John Naylor
Amazon Web Services
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2025-08-18 05:55:33 | Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options |
Previous Message | Michael Paquier | 2025-08-18 05:06:19 | Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options |