Re: GB18030-2022 Support in PostgreSQL

From: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: GB18030-2022 Support in PostgreSQL
Date: 2025-09-24 09:31:39
Message-ID: CAEoWx2nJeJu0s8YW+_ckikxSjHRgjVsK0HPQZFZ=N-1HEsyhVQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 24, 2025 at 5:18 PM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:

>
> On Sep 24, 2025, at 15:04, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>
> On Sep 24, 2025, at 14:42, John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> Sounds good. Were you also interested in seeing if EUC_CN can use the
> same UCM file? That would allow us to get rid of the XML file.
>
>
> Sure, let me take a look.
>
>
> I found that both EUC_CN and UHC use the same XML file, so I updated both.
>
> I didn’t delete gb-18030-2000.xml in this patch, because it would make the
> patch file very large, you can just add the deletion to the commit when you
> push it.
>
> Basically, the changes are all borrowed from the previous commit. With
> this patch, regenerating the maps file lead to no map file change, which is
> expected:
>
> ```
> % make utf8_to_uhc.map utf8_to_euc_cn.map
> '/usr/bin/perl' -I . UCS_to_UHC.pl
> - Writing UTF8=>UHC conversion table: utf8_to_uhc.map
> - Writing UHC=>UTF8 conversion table: uhc_to_utf8.map
> '/usr/bin/perl' -I . UCS_to_EUC_CN.pl
> - Writing UTF8=>EUC_CN conversion table: utf8_to_euc_cn.map
> - Writing EUC_CN=>UTF8 conversion table: euc_cn_to_utf8.map
>
> % git diff # no map file change
> %
> ```
>
> I am not sure if you should also upgrade the UCM file to 2022 version, but
> if we need, let’s do it with a separate commit.
>
>
I included deletion of the xml file in v2, which will help confirm that
build will pass clearly. I realized that the patch files were huge because
of the map file changes.

Best regards,
Chao Li (Evan)
---------------------
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachment Content-Type Size
v2-0001-Generate-EUC_CN-and-UHC-mappings-from-the-Unicode.patch application/octet-stream 862.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2025-09-24 10:07:07 Re: Report bytes and transactions actually sent downtream
Previous Message John Naylor 2025-09-24 09:30:00 Re: use radix tree for bitmap heap scan