Quick Links

Re: GB18030-2022 Support in PostgreSQL

From:	Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To:	John Naylor <johncnaylorls(at)gmail(dot)com>
Cc:	Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject:	Re: GB18030-2022 Support in PostgreSQL
Date:	2025-08-18 06:35:42
Message-ID:	ab0491fa-2c63-437c-b5aa-889607dacdd0@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2025/8/18 13:18, John Naylor wrote:
> We split a patch into multiple patches, it's customary include all of
> them, since that process may result in unwelcome artifacts to sort
> out. (When the first step has architectural questions or change in
> behavior, we may treat it as independent, possibly with a separate
> thread, but that's not the case here.)

Thanks for the explanation. I thought to make the second patch only
after the first patch is pushed. I am new to PostgreSQL contribution,
your guidance is very helpful for my future work.

Now I attach the both patch files.

For the second patch, I have tested it manually again. And "make check"
test passed.

> -# The lines we care about in the source file look like
> +# The lines we care about in the source file look like:
>
> These are spurious changes, which we try to avoid.

Updated.

> - next if (!m/<a u="([0-9A-F]+)" b="([0-9A-F ]+)"/);
>
> + if (/^<U([0-9A-Fa-f]+)>\s+((?:\\x[0-9A-Fa-f]{2})+)\s*\|(\d+)/) {
>
> This change in style caused extra whitespace-only churn. That obscures
> what the actual changes are.

Updated.

> + # Match lines like: <UXXXX> \xYY[\xYY...] |n, and use only (|0) mappings
>
> This is missing an explanation of why we skip non-zero mappings.
> Code-wise, this only matters for the output in the follow-on patch for
> 2022, but one of these patches needs to include a brief explanation. I
> did not like the detailed description that was present in one of the
> earlier 2022 patches that told how many characters were flagged a
> certain way -- that's irrelevant detail and will likely get out of
> date in some future version anyway.

Okay, I kept a neat version of comment now.

> +# and n is a flag indicating the type of mapping having
> +# a single value of 0.
>
> This seems weird when combined with the logic to filter out non-zero
> mappings. We need to think about when and where to show relevant
> information.

Updated the comment.

> + next if ($flag ne '0'); # non-0 flags
>
> This comment is just repeating what the code is doing, and it's very
> obvious what it's doing.

Removed the useless comment.

>
> BTW, it sounds like your proposed Makefile changes are needed for the
> follow-on patch with .map changes to work at all, is that right?
>
> https://www.postgresql.org/message-id/1CA8625F-AA41-4ED2-B60F-E28AC71F37DC@highgo.com
>
I think that patch could be separate, because the makefile changes are
generic to all map files. The current GB18030 patch doesn't depend on
that makefile patch at all. The makefile patch just makes build a little
bit easier upon map file changes.

Best regards,

Chao Li (Evan)
--------------------
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachment	Content-Type	Size
v1-0001-Upgrade-GB18030-encoding-support-from-2000-to-202.patch	text/plain	456.8 KB
v2-0001-GB18030-Switch-to-using-gb-18030-2000.ucm.patch	text/plain	4.8 KB

In response to

Re: GB18030-2022 Support in PostgreSQL at 2025-08-18 05:18:25 from John Naylor

Responses

Re: GB18030-2022 Support in PostgreSQL at 2025-08-18 08:34:29 from John Naylor

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Laurenz Albe	2025-08-18 06:40:02	Re: analyze-in-stages post upgrade questions
Previous Message	Dilip Kumar	2025-08-18 06:32:15	Re: Conflict detection for update_deleted in logical replication