Re: A Patch for MIC to EUC_TW code converting in mb support

From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: pgman(at)candle(dot)pha(dot)pa(dot)us
Cc: cch(at)cc(dot)kmu(dot)edu(dot)tw, pgsql-patches(at)postgresql(dot)org
Subject: Re: A Patch for MIC to EUC_TW code converting in mb support
Date: 2001-01-23 01:04:12
Message-ID: 20010123100412W.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers pgsql-patches

> Tatsuo, I assume these are all done in 7.1, right?

Yes.
--
Tatsuo Ishii

> > > ============================================================================
> > >
> > > POSTGRESQL BUG REPORT: MIC to EUC_TW code converting in mb support
> > > ============================================================================
> > >
> > > System Configuration
> > > ---------------------
> > > Architecture (example: Intel Pentium) :x86
> > > Operating System (example: Linux 2.0.26 ELF) :Linux 2.2.x and FreeBSD
> > > 3.5R
> > > PostgreSQL version (example: PostgreSQL-7.0) :PostgreSQL-7.0.2
> > > Compiler used (example: gcc 2.8.0) :egcs-2.91.66, gcc 2.7.3
> > >
> > > A FULL description of the problem:
> > > ------------------------------------------------
> > > In PostgreSQL mb (multi-byte) support, there is a bug in code converting
> > >
> > > for MIC to EUC_TW. Original mic2euc_tw() in conv.c converts CNS
> > > 11643-1992
> > > Plane 2 into 2 bytes EUC_TW encoding. But characters in CNS 11643-1992
> > > Plane 2
> > > should be converted into 4 bytes EUC_TW encoding instead.
> > >
> > > A way to repeat the problem:
> > > ----------------------------------------------------------------------
> > > When you initdb with -E EUC_TW and set PGCLIENTENCODING to BIG5,
> > > you will find all the characters in CNS 11643-1992 Plane 2 are
> > > incorrectly stored or output.
> > >
> > > This problem might be fixed by the solution in the attachement.
> >
> > Thanks for pointing it out. Your fix seems correct.
> >
> > BTW I have found another bug with EUC_TW support. line 917 in conv.c:
> >
> > *p++ = c1 - LC_CNS11643_3 + 0xa3;
> >
> > this should be:
> >
> > *p++ = *mic++ - LC_CNS11643_3 + 0xa3;
> >
> > Otherwise, CNS 11643-1992 Plane 3 or more won't work. Could you test
> > it out with CNS 11643-1992 Plane 3 or more?
> >
> > If they are ok, I will fix the current source and make a patch for
> > 7.0.3 (I guess it's too late to back-patch the 7.0 tree).
> > --
> > Tatsuo Ishii
> >
>
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
> + If your life is a hard drive, | 830 Blythe Avenue
> + Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Browse pgsql-docs by date

  From Date Subject
Next Message Newsbird 2001-01-23 01:27:53 Re: the docs, and newbies.
Previous Message Bruce Momjian 2001-01-22 22:21:37 Re: A Patch for MIC to EUC_TW code converting in mb support

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2001-01-23 01:06:15 Re: realloc suggestion
Previous Message Barry Lind 2001-01-23 00:48:20 Re: [GENERAL] postgres memory management

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2001-01-23 04:45:45 Re: Re: more odbc patches
Previous Message Ian Lance Taylor 2001-01-23 00:20:17 Support for cursors in PL/pgSQL