Re: BUG #19354: JOHAB rejects valid byte sequences

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeroen Vermeulen <jtvjtv(at)gmail(dot)com>, VASUKI M <vasukianand0119(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #19354: JOHAB rejects valid byte sequences
Date: 2025-12-17 02:59:17
Message-ID: aUIchajpeYVTF4BT@paquier.xyz
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Dec 16, 2025 at 10:41:46AM -0500, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I'm
>> left with the conclusions that (1) nobody ever actually tried using
>> this encoding for anything real until 3 days ago and (2) we don't have
>> any testing infrastructure that verifies that the characters in the
>> mapping tables are actually accepted by pg_verifymbstr(). I wonder how
>> many other encodings we have that don't actually work?
>
> Indeed. Anyone want to do some testing?

FWIW, I have been made aware a couple of weeks ago by a colleague that
SJIS and SHIFT_JIS_2004 are used by some customers, and that we are
many years behind an update of the conversion mappings in the tree
with Postgres not understanding some of the characters. These are two
marginal in the mostly-UTF8 world we live in these days, but it's
annoying for byte sequences that should not change across the years,
just be refreshed with new data.
--
Michael

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message VASUKI M 2025-12-17 05:32:32 Re: Cluster is not being created
Previous Message ZhangChi 2025-12-17 01:40:29 Re: BUG #19350: Short circuit optimization missed when runningsqlscriptes in JDBC