Re: BUG #19354: JOHAB rejects valid byte sequences

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeroen Vermeulen <jtvjtv(at)gmail(dot)com>, VASUKI M <vasukianand0119(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #19354: JOHAB rejects valid byte sequences
Date: 2026-04-14 06:30:08
Message-ID: CA+hUKGKy-ViGBXdOjcPownBM=OdWiULO8H1RyH1r_8qNp=U4CA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Dec 17, 2025 at 7:43 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I think there is a good chance that the right going-forward fix is to
> deprecate the encoding, because according to
> https://www.unicode.org/Public/MAPPINGS/EASTASIA/ReadMe.txt this and
> everything else that's now under
> https://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/ were
> deprecated in 2001. By the time v19 is released, the deprecation will
> be a quarter-century old, and the fact that it doesn't work is good
> evidence that few people will miss it, though perhaps the original
> poster will want to put forward an argument for why we should still
> care about this.

Right, that stuff was withdrawn, along with the BIG5 and JIS X 0212
mappings (here's some interesting discussion about their normative
status[1]). From what I can figure out, JOHAB was an MS-DOS codepage
(1361), obsoleted by UHC (949) some time around MS-DOS 6.22 or MS-DOS
7 and Windows 95.

So +1 from me, set the phasers to git rm. Based on the comments for
enum pg_enc, we don't need to worry about numerical stability of
client-only encodings, so I just deleted it (unlike PG_MULE_INTERNAL
which became PG_UNUSED_1). I didn't mention it in
doc/src/sgml/appendix-obsolete.sgml: the decision criterion for that
seems to be that there was an SGML id that appeared in a URL, which is
not the case here. The release notes seem like enough of a tombstone
for something that we strongly suspect has 0 users. Wait until 20, or
just do it now?

I don't have an opinion yet whether the code in the back-branches
might be dangerous, or "fixing" it might be more dangerous, but it's
an interesting question...

[1] https://unicode.org/mail-arch/unicode-ml/y2002-m03/0691.html

Attachment Content-Type Size
0001-Remove-JOHAB-encoding.patch.gz application/gzip 126.5 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Heikki Linnakangas 2026-04-14 08:50:50 Re: BUG #19354: JOHAB rejects valid byte sequences
Previous Message Michael Paquier 2026-04-13 17:24:04 Re: BUG #19006: Assert(BufferIsPinned) in BufferGetBlockNumber() is triggered for forwarded buffer