From: | Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> |
---|---|
To: | hlinnaka(at)iki(dot)fi |
Cc: | horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Illegal SJIS mapping |
Date: | 2016-10-18 05:34:33 |
Message-ID: | 20161018.143433.1192646816835803355.t-ishii@sraoss.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> However, running the script with that doesn't produce exactly what we
> have in utf8_to_sjis.map, either. It's otherwise same, but we have
> some extra mappings:
>
> - {0xc2a5, 0x5c},
0xc2a5 is U+00a5. The glyph is "YEN SIGN" which is corresponding to
0x5c in SJIS. So this is a valid mapping.
In the mean time, Microsoft wants to map U+005c to 0x5c in CP932. The
glyph of U+005c is "REVERSE SOLDIUS" (back slash). So MS
decided that the glyph of U+00x5c is "YEN SIGN" in CP932!
In summary we need to keep both of mappings:
U+00a5 (utf 0xc2a5) -> 0x5c and U+005c -> 0x5c.
Obviously this breaks the round trip conversion between UTF8 and SJIS
encoding in this case though.
> - {0xc2ac, 0x81ca},
U+00ac (NOT SIGN). Exists in SJIS.
> - {0xe28096, 0x8161},
U+2016 (DOUBLE VERTICAL LINE). Exists in SJIS.
> - {0xe280be, 0x7e},
U+213e (OVERLINE). Mapped to acii 0x7e, which is "half width tilde".
> - {0xe28892, 0x817c},
U+2212 (MINUS SIGN). Mapped to "double width minus sign" in SJIS.
> - {0xe3809c, 0x8160},
u+301c (WAVE DASH). Mapped to "double width wave dash" in SJIS.
> Those mappings were added in commit
> a8bd7e1c6e026678019b2f25cffc0a94ce62b24b, back in 2002. The bogus
> mapping for the invalid 0xc19c UTF-8 byte sequence was also added by
> that commit, as well a few valid mappings that UCS_to_SJIS.pl also
> produces.
>
> I can't judge if those mappings make sense. If we can't find an
> authoritative source for them, I suggest that we leave them as they
> are, but also hard-code them to UCS_to_SJIS.pl, so that running that
> script produces those mappings in utf8_to_sjis.map, even though they
> are not present in the CP932.TXT source file.
Sounds acceptable.
In summary current PostgreSQL UTF8 <--> SJIS mapping is a somewhat
mixture of SJIS (Shift_JIS) and MS932. There's no cleaner solution to
exodus this situation. I think we need live with it.
Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2016-10-18 07:35:27 | Re: Password identifiers, protocol aging and SCRAM protocol |
Previous Message | Kyotaro HORIGUCHI | 2016-10-18 04:10:42 | Re: Illegal SJIS mapping |