| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
| Cc: | Jeroen Vermeulen <jtvjtv(at)gmail(dot)com>, VASUKI M <vasukianand0119(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org |
| Subject: | Re: BUG #19354: JOHAB rejects valid byte sequences |
| Date: | 2025-12-16 15:41:46 |
| Message-ID: | 2393116.1765899706@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> ... So I went looking for
> where we got the mapping tables from. UCS_to_JOHAB.pl expects to read
> from a file JOHAB.TXT, of which the latest version seems to be found
> here:
> https://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/KSC/JOHAB.TXT
> And indeed, if I run UCS_to_JOHAB.pl on that JOHAB.txt file, it
> regenerates the current mapping files.
Thanks for doing that research!
> So apparently we've
> got the "right" mappings, but you can only actually the ones that
> match the code's rules for something to be a valid multi-byte
> character, which aren't actually in sync with the mapping table.
Yeah. Looking at the code in wchar.c, it's clear that it thinks
that JOHAB has the same character-length rules as EUC_KR, which is
something that one might guess based on available documentation that
says it's related to that encoding. So I can see how we got here.
However, that doesn't mean we can fix pg_johab_mblen() and we're done.
I'm still quite afraid that we'd be introducing security-grade
inconsistencies of interpretation between different PG versions.
> I'm
> left with the conclusions that (1) nobody ever actually tried using
> this encoding for anything real until 3 days ago and (2) we don't have
> any testing infrastructure that verifies that the characters in the
> mapping tables are actually accepted by pg_verifymbstr(). I wonder how
> many other encodings we have that don't actually work?
Indeed. Anyone want to do some testing?
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Robert Haas | 2025-12-16 18:42:51 | Re: BUG #19354: JOHAB rejects valid byte sequences |
| Previous Message | Greg Sabino Mullane | 2025-12-16 15:17:08 | Re: BUG #19350: Short circuit optimization missed when running sqlscriptes in JDBC |