Re: Bug #943: Server-Encoding from EUC_TW to UTF-8 doesn'twork

From: "Enke, Michael" <michael(dot)enke(at)wincor-nixdorf(dot)com>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug #943: Server-Encoding from EUC_TW to UTF-8 doesn'twork
Date: 2003-04-14 12:19:18
Message-ID: 3E9AA746.2E07B899@wincor-nixdorf.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

I tried also BIG5 encoded data (Trad. Chinese for Taiwan) and got warnings:
WARNING: copy: line 4586, LocalToUtf: could not convert (0xf9d7) BIG5 to UTF-8. Ignored
...
Is this also solved with this fix?

Michael

Tatsuo Ishii wrote:
>
> It turned out that it's a bug with encoding conversion engine of
> PostgreSQL. It just failed to find proper entry from a encoding
> conversion table because of a integer overflow problem. Since only
> maps for EUC_TW have such a huge code point values (for example
> 0x8eaee7aa), I believe the conversion failure merely occurs with the
> particular encoding. Included patches should solve the problem (it is
> against PostgreSQL 7.3.2).
>
> BTW, I'm surprised to find the bug since it has been there since 7.2
> days.
>
> I'm going to commit the fix to both current and 7.3-stable trees.
> --
> Tatsuo Ishii
>
> > Short Description
> > Server-Encoding from EUC_TW to UTF-8 doesn't work
> >
> > Long Description
> > System: SuSE Linux 8.1, kernel 2.4.19, glibc 2.2.5/glibc-locale 2.2.5
> > the same error on RedHat 7.3, kernel 2.4.20, glibc2.2.5
> > postgresql version 7.3.2
> > description: I loaded Chinese (TW) characters, encoded as UTF-8 into a
> > database which has UTF-8 encoding with "copy table from 'original'" with psql. Ok.
> > Than I exit from psql, exported PGCLIENTENCODING=EUC_TW
> > I started psql, make a "copy table to 'file.EUC_TW'". Ok.
> > If I convert this file to UTF-8 with iconv -f EUC-TW -t UTF-8 file.EUC_TW file.UTF-8
> > than file.UTF-8 looks ecaxtly the same as the original.
> > That means, PostgreSQL converts from UTF-8 to EUC_TW correct.
> > Now I load the exported file 'file.EUC_TW' back into DB:
> > "copy table2 from 'file.EUC_TW'", still I did not finish psql,
> > PGCLIENTENCODING is the same as for "copy to".
> > Now I get error telling me: "copy: line 1, LocalToUtf: could not convert (0xe5b5) EUC_TW to UTF-8" ... and the characters are missing in table2
> >
> > Sample Code
> > UTF-8:
> > 00000000: e795 b6e6 97a5 0ae5 959f e58b 95e4 b8ad
> > 00000010: 2ce4 bd86 e69c 89e9 8caf e8aa a40a
> >
> > EUC_TW as exported from PostgreSQL and not imported:
> > 00000000: e5b5 c5ca 0ada f6d9 afc4 e32c c8fe c8b4
> > 00000010: f2e3 eba8 0a
>
> *** src/backend/utils/mb/conv.c.orig 2003-04-12 10:03:25.000000000 +0900
> --- src/backend/utils/mb/conv.c 2003-04-12 10:16:04.000000000 +0900
> ***************
> *** 313,319 ****
>
> v1 = *(unsigned int *) p1;
> v2 = ((pg_utf_to_local *) p2)->utf;
> ! return (v1 - v2);
> }
>
> /*
> --- 313,319 ----
>
> v1 = *(unsigned int *) p1;
> v2 = ((pg_utf_to_local *) p2)->utf;
> ! return (v1 > v2)?1:((v1 == v2)?0:-1);
> }
>
> /*
> ***************
> *** 328,334 ****
>
> v1 = *(unsigned int *) p1;
> v2 = ((pg_local_to_utf *) p2)->code;
> ! return (v1 - v2);
> }
>
> /*
> --- 328,334 ----
>
> v1 = *(unsigned int *) p1;
> v2 = ((pg_local_to_utf *) p2)->code;
> ! return (v1 > v2)?1:((v1 == v2)?0:-1);
> }
>
> /*

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Ennio-Sr 2003-04-14 15:26:59 Re: Psql 'Expanded display (\x)' behaviour
Previous Message Peter Eisentraut 2003-04-13 23:40:37 Re: Psql 'Expanded display (\x)' behaviour

Browse pgsql-hackers by date

  From Date Subject
Next Message Bob Kline 2003-04-14 12:22:53 Re: Upgrade to Red Hat Linux 9 broke PostgreSQL
Previous Message Justin Clift 2003-04-14 11:21:37 Anyone in Brisbane, Australia, and decent with Linux & PostgreSQL?