Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8
Date: 2020-10-30 03:24:32
Message-ID: 53742.1604028272@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Amit Langote <amitlangote09(at)gmail(dot)com> writes:
> On Fri, Oct 30, 2020 at 9:44 AM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
>> Today while working on some other task related to database encoding, I
>> noticed that the MINUS SIGN (with byte sequence a1-dd) in EUC-JP is
>> mapped to FULLWIDTH HYPHEN-MINUS (with byte sequence ef-bc-8d) in
>> UTF-8. See below:
>> ...
>> Isn't this a bug?

> Can't tell what reason there was to do that, but there must have been
> some. Maybe the Japanese character sets prefer full-width hyphen
> minus (unicode U+FF0D) over mathematical minus sign (U+2212)?

The way it's been explained to me in the past is that the conversion
between Unicode and the various Japanese encodings is not as well
defined as one could wish, because there are multiple quasi-standard
versions of the Japanese encodings. So we shouldn't move too hastily
on changing this. Maybe it's really a bug, but maybe there are good
reasons.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-10-30 03:25:10 Re: Add Information during standby recovery conflicts
Previous Message Kyotaro Horiguchi 2020-10-30 03:19:50 Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8