Quick Links

Re: BUG #19354: JOHAB rejects valid byte sequences

From:	VASUKI M <vasukianand0119(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jeroen Vermeulen <jtvjtv(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: BUG #19354: JOHAB rejects valid byte sequences
Date:	2025-12-16 06:23:48
Message-ID:	CAE2r8H5vaSyaC_t1FcpHBo-BB_=SrFj7GFnOC-SxC6WDf5c9VA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

Thanks all,That analysis makes a lot of sense.

Given the lack of a clear spec,the existence of multiple JOHAB variants,and
how long this has apparently been "working" without anyone noticing,IMHO
desupporting it does seem like the least risky option.At this point,trying
to fix JOHAB variants feels like opening a pretty big can of
worms,especially with the potential for dump/reload surprises or subtle
parsing/security issues.

I don't have additional data to add,but +1 on removal or deprecation being
a reasonable outcome here,given how obscure and effectively dead the
encoding is nowadays.

Thanks for digging into this.

Cheers,
Vasuki M

On Tue, Dec 16, 2025 at 11:46 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Jeroen Vermeulen <jtvjtv(at)gmail(dot)com> writes:
> > This bit worries me: "TlOther, vendor-defined, Johab variants also
> exist" —
> > such as an EBCDIC-based one and a stateful one!
>
> Yeah. So what we have here is:
>
> 1. Our JOHAB implementation has apparently been wrong since day one.
>
> 2. Wrongness may be in the eye of the beholder, since there are
> multiple versions of JOHAB.
>
> 3. Your complaint is the first, AFAIR.
>
> 4. That wikipedia page says "Following the introduction of Unified
> Hangul Code by Microsoft in Windows 95, and Hangul Word Processor
> abandoning Johab in favour of Unicode in 2000, Johab ceased to be
> commonly used."
>
> Given these things, I wonder if we shouldn't desupport JOHAB
> rather than attempt to fix it. Fixing would likely be a significant
> amount of work: if we don't even have the character lengths right,
> how likely is it that our conversions to other character sets are
> correct? I also worry that if different PG versions have different
> ideas of the mapping, there could be room for dump/reload problems,
> and maybe even security problems related to the backslash issue.
>
> regards, tom lane
>
>
>
>
>

In response to

Re: BUG #19354: JOHAB rejects valid byte sequences at 2025-12-16 00:56:09 from Tom Lane

Responses

Re: BUG #19354: JOHAB rejects valid byte sequences at 2025-12-16 07:42:09 from Jeroen Vermeulen

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Jeroen Vermeulen	2025-12-16 07:42:09	Re: BUG #19354: JOHAB rejects valid byte sequences
Previous Message	PG Bug reporting form	2025-12-16 03:06:11	BUG #19356: Unexpected result of prepared UPDATE with force_generic_plan