Re: Proposal: Adding JIS X 0213 support

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Adding JIS X 0213 support
Date: 2007-03-25 12:32:50
Message-ID: 20070325.213250.46334984.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > Tatsuo Ishii <ishii(at)postgresql(dot)org> writes:
> > >> I'm confused. If this is exactly the same as EUC_JP, why do we need
> > >> any new code at all?
> >
> > > I said *encoding schema" is same, not the contents (character set) is
> > > same. In another word, characters included in EUC_JP are not same as
> > > EUC_JIS_2004.
> >
> > I'm still confused. If the set of characters is different, then surely
> > we need at least a different UTF8<->EUC_JIS_2004 conversion function?
>
> Yes, exactly. I will come up with new conversions later.

I have committed changes to add JIS X 0213 along with conversions.

New encodings:

EUC_JIS_2004: JIS X 0213 encoded in EUC
SHIFT_JIS_2004: JIS X 0213 encoded in Shift JIS (client only encoding)

These encodings support following character sets:

ASCII, JIS X 0201 (single byte "katakana"), JIS X 0213 plane 1, 2

New conversions:

EUC_JIS_2004 --> UTF8: euc_jis_2004_to_utf8
UTF8 --> EUC_JIS_2004: utf8_to_euc_jis_2004
SHIFT_JIS_2004 --> UTF8: shift_jis_2004_to_utf8
UTF8 --> SHIFT_JIS_2004: utf8_to_shift_jis_2004
EUC_JIS_2004 --> SHIFT_JIS_2004: euc_jis_2004_to_shift_jis_2004
SHIFT_JIS_2004 --> EUC_JIS_2004: shift_jis_2004_to_euc_jis_2004

To generate conversion maps, I have created two perl scripts
UCS_to_SHIFT_JIS_2004.pl and UCS_to_EUC_JIS_2004.pl, which use
sjis-0213-2004-std.txt and euc-jis-2004-std.txt as the source of
conversion specification. They are freely obtained from
http://x0213.org.

Conversions to UTF-8 from EUC_JIS_2004 and SHIFT_JIS_2004
require supporting UTF-8 "combined characters" i.e. a logical
character consists of two UTF-8 characters. To implement this, I have
modified LocalToUtf() and UtfToLocal() by adding new parameter:
"combined character map".

docs changes and regression test changes are committed too.

Beware that I have updated catalog versions. Please do initdb.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2007-03-25 15:45:11 Re: Load distributed checkpoint V3
Previous Message Gregory Stark 2007-03-25 12:31:21 Re: Idea for cleaner representation of snapshots