Re: Supporting SJIS as a database encoding

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com
Cc: hlinnaka(at)iki(dot)fi, tgl(at)sss(dot)pgh(dot)pa(dot)us, ishii(at)sraoss(dot)co(dot)jp, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Supporting SJIS as a database encoding
Date: 2016-09-07 07:13:04
Message-ID: 20160907.161304.112519789.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

At Tue, 6 Sep 2016 03:43:46 +0000, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com> wrote in <0A3221C70F24FB45833433255569204D1F5E66CE(at)G01JPEXMBYT05>
> > From: pgsql-hackers-owner(at)postgresql(dot)org
> > [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Kyotaro
> > HORIGUCHI
> Implementing radix tree code, then redefining the format of mapping table
> > to suppot radix tree, then modifying mapping generator script are needed.
> >
> > If no one oppse to this, I'll do that.
>
> +100
> Great analysis and your guts. I very much appreciate your trial!

Thanks, by the way, there's another issue related to SJIS
conversion. MS932 has several characters that have multiple code
points. By converting texts in this encoding to and from Unicode
causes a round-trop problem. For example,

8754(ROMAN NUMERICAL I in NEC specials)
=> U+2160(ROMAN NUMERICAL I)
=> FA4A (ROMAN NUMERICA I in IBM extension)

My counting said that 398 characters are affected by this kind of
replacement. Addition to that, "GAIJI" (Private usage area) is
not allowed. Is this meet your purpose?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2016-09-07 07:23:59 Re: patch: function xmltable
Previous Message Simon Riggs 2016-09-07 07:11:02 Re: Optimization for lazy_scan_heap