Re: GB18030-2022 Support in PostgreSQL

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: John Naylor <johncnaylorls(at)gmail(dot)com>, JiaoShuntian <jiaoshuntian(at)highgo(dot)com(dot)w(dot)kunlunaq(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: GB18030-2022 Support in PostgreSQL
Date: 2025-08-04 13:51:01
Message-ID: 851769.1754315461@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 2025-08-04 Mo 6:35 AM, John Naylor wrote:
>> There is a risk of breaking applications, although only a few dozen
>> mappings changed. If it were added as a separate encoding, users could
>> opt in.

> That makes sense ... naming the new encoding so as to avoid confusion
> might be a challenge.

We have precedent for that in SHIFT_JIS_2004. Presumably if we
make this a new encoding, it'd be GB18030_2022.

However, adding a new encoding ID is not without breakage risks
of its own, stemming from some code knowing the new ID and others
not. I recall that we had some actual problems of that ilk when
we added SHIFT_JIS_2004, and some of them were pretty subtle.
See e.g. this comment from src/bin/initdb/Makefile:

# Note: it's important that we link to encnames.o from libpgcommon, not
# from libpq, else we have risks of version skew if we run with a libpq
# shared library from a different PG version. Define
# USE_PRIVATE_ENCODING_FUNCS to ensure that that happens.

That was long enough ago that I have little faith either that that
fix still does what it intended to (the code has been rejiggered
significantly since the issue was last battle-tested), or that
there are not similar hazards elsewhere.

So on the whole I'd lean a bit towards just redefining GB18030 as
meaning the new standard. The fact that we don't support it as a
server-side encoding perhaps makes that idea more tenable than it
would be if the encoding governed the interpretation of our own
stored data.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2025-08-04 13:53:38 Re: Dropping publication breaks logical replication
Previous Message Andrew Dunstan 2025-08-04 13:09:47 Re: split func.sgml to separated individual sgml files