Re: Re: [PATCHES] A Patch for MIC to EUC_TW code converting in mbsupport

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: cch(at)cc(dot)kmu(dot)edu(dot)tw, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: [PATCHES] A Patch for MIC to EUC_TW code converting in mbsupport
Date: 2000-11-16 06:01:30
Message-ID: 200011160601.BAA02070@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers pgsql-patches

Can someone tell me where we are on this? Tatsuo, I think you said you
wanted to apply this fix.

> [Cced to hackers list]
>
> > > BTW I have found another bug with EUC_TW support. line 917 in conv.c:
> > >
> > > *p++ = c1 - LC_CNS11643_3 + 0xa3;
> > >
> > > this should be:
> > >
> > > *p++ = *mic++ - LC_CNS11643_3 + 0xa3;
> > >
> > > Otherwise, CNS 11643-1992 Plane 3 or more won't work. Could you test
> > > it out with CNS 11643-1992 Plane 3 or more?
> >
> > Thanks for your very quickly reply!!
>
> You are welcome.
>
> > I think you are right, but I have not test it.
> > Because original Big5 encoding does not contain characters in CNS 11643-1992
> > Plane 3.
> > But I will have a chance to test it, we here are seeking the support for Big5E
> > (an extendied Big5
> > encoding) in PostgreSQL. Though most people who use PostgresSQL in Taiwan only
> > cares about
> > Big5 encoding .
> >
> > Would you like to answer some mb related questions for me? I am a newbie :P
> >
> > 1.) Because the 2nd byte of Big5 encoding overlaps with ASCII,
> > such as '\' (this is very bad for many programs to work with Big5).
>
> As long as frontend side knows the current client side encoding is
> Big5, this should be no problem. At least for libpq. It examins the
> first byte of Big5. If it is greater than 0x7f, then it must be a
> double byte Hanji. So libpq reads 2 bytes in this case, not matter the
> second byte is '\'.
>
> > For example: If we initdb -E MULE_INTERNAL first,
> > SET CLIENT_ENCODING TO 'BIG5', and
> > INSERT INTO some_table VALUES (..., 'the last byte of some Big5 char is
> > backslash\',...),
> > then we can not successfully complete this SQL INSERT -- the prompt of psql
> > changes
> > but psql does not execute it. If we initdb -E with EUC_TW, it's OK.
> > Is this is a parsing problem? What's your suggestion for the solution?
>
> Hum. initdb -E MULE_INTERNAL should work as well. Let me dig into the
> problem. It would be nice if you could send me the Big5 data for
> testing by a private mail.
>
> BTW I would not recommend "SET CLIENT_ENCODING TO 'BIG5'" to do an
> on-the-fly encoding changes. Since in this way, frontend side has no
> idea what the client encoding is. 7.0.x overcome this problem by
> introducing new \encoding command. For 6.5 or before I would recommend
> to use PGCLIENTENCODING environment variable.
>
> > 2.) Is using MULE_INTERNAL faster than EUC_TW as backend encoding when
> > PostgreSQL processing Big5 data? (It seems
> > BIG5->big52mic()->mic2euc_tw()->EUC_TW
> > needs 2 code converting procedures, but BIG5->big52mic()->EUC_TW only needs
> > one from
> > the mb sources)
>
> Yes. But the difference would be very small. The expensive part is a
> table look-up in big52mic.
>
> BTW 7.1 will support automatic encoding conversion between Unicode
> (UTF-8) and Big5 (or EUC_TW). Try the snapshot if you like.
>
> > 3.) Dose PostgreSQL's ODBC driver support mb?
>
> I don't think so. For Japanese (EUC_JP/SJIS) Kataoka has made patches
> to enable MB support in ODBC. It should not be very difficult to
> support EUC_TW/Big5, I don't know.
> --
> Tatsuo Ishii
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Bruce Momjian 2000-11-16 06:03:48 Re: confused expression
Previous Message netad24 2000-11-15 14:19:16 aktuelle Viruswarnung - BITTE BEACHTEN

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2000-11-16 06:06:42 Re: int4 or int32
Previous Message Bruce Momjian 2000-11-16 05:50:00 Re: Syslog Facility Patch

Browse pgsql-patches by date

  From Date Subject
Next Message Tatsuo Ishii 2000-11-17 02:28:55 Re: [PATCHES] A Patch for MIC to EUC_TW code converting inmbsupport
Previous Message Bruce Momjian 2000-11-16 05:53:07 Re: patch for darwin/macosx-publicbeta