Re: [HACKERS] again: Bug #943: Server-Encoding from EUC_TW

From: "Enke, Michael" <michael(dot)enke(at)wincor-nixdorf(dot)com>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] again: Bug #943: Server-Encoding from EUC_TW
Date: 2003-06-24 18:28:24
Message-ID: 3EF89848.E00F0EF@wincor-nixdorf.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Tatsuo Ishii wrote:
>
> > > > I reported bug #943 (I found in 7.3.2) and you checked in some change against integer overflow.
> > > > Now I upgraded to 7.3.3 and I'm not happy with this.
> > > > The exact error as I described is fixed, but I found new errors in conversion UTF-8 <-> EUC_TW and BIG5:
> > > >
> > > > Copy to table (DB has UTF-8 encoding) from file:
> > > > for PGCLIENTENCODING=BIG5:
> > > > WARNING: copy: line 1, LocalToUtf: could not convert (0xf9d6) BIG5 to UTF-8. Ignored
> > > > WARNING: copy: line 2, LocalToUtf: could not convert (0xf9d7) BIG5 to UTF-8. Ignored
> > > > WARNING: copy: line 3, LocalToUtf: could not convert (0xf9d8) BIG5 to UTF-8. Ignored
> > > > WARNING: copy: line 4, LocalToUtf: could not convert (0xf9db) BIG5 to UTF-8. Ignored
> > >
> > > I see no problem here. The only standard conversion map I could found
> > > on-line form so far (see below URL) does not include entries 0xf9d6 or
> > > above.
> > >
> > > http://www.unicode.org/Public/UNIDATA/Unihan.txt
> >
> >
> > I found in this file:
> > U+F9D7 in line 604519
> > U+F9D8 in line 219540
> > U+F9D6...U+F9DB in lines 730707...730766.
>
> No. U+F9D6 means *Unicode* code point, not BIG5 code point.

Ok.
I have looked into my Linux box and found this in /usr/share/i18n/charmaps/BIG5.gz:
% Chinese charmap for BIG5 (CP950)
% version: 0.92
% Contact: Tung-Han Hsieh <thhsieh(at)linux(dot)org(dot)tw>
% Yuan-Chung Cheng <platin(at)ms31(dot)hinet(dot)net>
% Distribution and use is free, even for comercial purpose.
%
% This charmap is converted from:
% ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT
% ...

There "my" characters are in.

Don't you agree that it is strange that I can (for EUC_TW) copy "to" file without error
but I can not copy "from" file without error?

Michael

> >
> > > > for EUC_TW
> > > > WARNING: copy: line 1, LocalToUtf: could not convert (0x8ea3c3b7) EUC_TW to UTF-8. Ignored
> > > > WARNING: copy: line 2, LocalToUtf: could not convert (0x8ea3cfd0) EUC_TW to UTF-8. Ignored
> > > > WARNING: copy: line 3, LocalToUtf: could not convert (0x8ea3c4ce) EUC_TW to UTF-8. Ignored
> > > > WARNING: copy: line 4, LocalToUtf: could not convert (0x8ea3bdfe) EUC_TW to UTF-8. Ignored
> > >
> > > Hum. These seem to be CNS 11643-1993, plane 3. Currently PostgreSQL
> > > supports only:
> > >
> > > CNS 11643-1993, plane 0
> > > CNS 11643-1993, plane 1
> > > CNS 11643-1993, plane 2
> > > CNS 11643-1993, plane 15
> > >
> > > Would you like to have support for rest of CNS 11643-1993 planes:
> > >
> > > CNS 11643-1993, plane 3
> > > CNS 11643-1993, plane 4
> > > CNS 11643-1993, plane 5
> > > CNS 11643-1993, plane 6
> > > CNS 11643-1993, plane 7
> > >
> > > support for upcoming 7.4?
> > >
> > > > Copy out to file from table (UTF-8 data):
> > > > to BIG5
> > > > WARNING: UtfToLocal: could not convert UTF-8 (0xe7a281). Ignored
> > > > WARNING: UtfToLocal: could not convert UTF-8 (0xe98ab9). Ignored
> > > > WARNING: UtfToLocal: could not convert UTF-8 (0xe8a38f). Ignored
> > > > WARNING: UtfToLocal: could not convert UTF-8 (0xe7b2a7). Ignored
> > > >
> > > > to EUC_TW is ok!
> > >
> > > BIG5 and EUC_TW have different code points. So this is not very strange.
> >
> >
> > But it is very strange that I can (for EUC_TW) copy to file without error but I can not copy from file without error.
> >
> > Michael
> >

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Kay-Uwe Genz 2003-06-24 18:46:21 Problem with timezones
Previous Message Curt Sampson 2003-06-24 09:56:20 pg_dump -t option doesn't take schema-qualified table names

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2003-06-24 18:43:24 Re: interval's and printing...
Previous Message Maksim Likharev 2003-06-24 17:30:27 Re: TO_CHAR SO SLOW???