Re: [PATCHES] A Patch for MIC to EUC_TW code converting in mbsupport

From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: cch(at)cc(dot)kmu(dot)edu(dot)tw
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] A Patch for MIC to EUC_TW code converting in mbsupport
Date: 2000-11-14 05:54:34
Message-ID: 20001114145434O.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers pgsql-patches

[Cced to hackers list]

> > BTW I have found another bug with EUC_TW support. line 917 in conv.c:
> >
> > *p++ = c1 - LC_CNS11643_3 + 0xa3;
> >
> > this should be:
> >
> > *p++ = *mic++ - LC_CNS11643_3 + 0xa3;
> >
> > Otherwise, CNS 11643-1992 Plane 3 or more won't work. Could you test
> > it out with CNS 11643-1992 Plane 3 or more?
>
> Thanks for your very quickly reply!!

You are welcome.

> I think you are right, but I have not test it.
> Because original Big5 encoding does not contain characters in CNS 11643-1992
> Plane 3.
> But I will have a chance to test it, we here are seeking the support for Big5E
> (an extendied Big5
> encoding) in PostgreSQL. Though most people who use PostgresSQL in Taiwan only
> cares about
> Big5 encoding .
>
> Would you like to answer some mb related questions for me? I am a newbie :P
>
> 1.) Because the 2nd byte of Big5 encoding overlaps with ASCII,
> such as '\' (this is very bad for many programs to work with Big5).

As long as frontend side knows the current client side encoding is
Big5, this should be no problem. At least for libpq. It examins the
first byte of Big5. If it is greater than 0x7f, then it must be a
double byte Hanji. So libpq reads 2 bytes in this case, not matter the
second byte is '\'.

> For example: If we initdb -E MULE_INTERNAL first,
> SET CLIENT_ENCODING TO 'BIG5', and
> INSERT INTO some_table VALUES (..., 'the last byte of some Big5 char is
> backslash\',...),
> then we can not successfully complete this SQL INSERT -- the prompt of psql
> changes
> but psql does not execute it. If we initdb -E with EUC_TW, it's OK.
> Is this is a parsing problem? What's your suggestion for the solution?

Hum. initdb -E MULE_INTERNAL should work as well. Let me dig into the
problem. It would be nice if you could send me the Big5 data for
testing by a private mail.

BTW I would not recommend "SET CLIENT_ENCODING TO 'BIG5'" to do an
on-the-fly encoding changes. Since in this way, frontend side has no
idea what the client encoding is. 7.0.x overcome this problem by
introducing new \encoding command. For 6.5 or before I would recommend
to use PGCLIENTENCODING environment variable.

> 2.) Is using MULE_INTERNAL faster than EUC_TW as backend encoding when
> PostgreSQL processing Big5 data? (It seems
> BIG5->big52mic()->mic2euc_tw()->EUC_TW
> needs 2 code converting procedures, but BIG5->big52mic()->EUC_TW only needs
> one from
> the mb sources)

Yes. But the difference would be very small. The expensive part is a
table look-up in big52mic.

BTW 7.1 will support automatic encoding conversion between Unicode
(UTF-8) and Big5 (or EUC_TW). Try the snapshot if you like.

> 3.) Dose PostgreSQL's ODBC driver support mb?

I don't think so. For Japanese (EUC_JP/SJIS) Kataoka has made patches
to enable MB support in ODBC. It should not be very difficult to
support EUC_TW/Big5, I don't know.
--
Tatsuo Ishii

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Chih-Chang Hsieh 2000-11-15 01:16:58 Re: [PATCHES] A Patch for MIC to EUC_TW code converting inmbsupport
Previous Message He Weiping (Laser Henry) 2000-11-13 02:26:26 confused expression

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2000-11-14 05:57:43 Re: Syslog Facility Patch
Previous Message Hiroshi Inoue 2000-11-14 05:51:25 SearchSysCacheTuple(Copy)

Browse pgsql-patches by date

  From Date Subject
Next Message Peter Bierman 2000-11-14 09:24:55 patch for darwin/macosx-publicbeta
Previous Message Tom Lane 2000-11-14 03:16:07 Re: Re: [PATCHES] PostgreSQL virtual hosting support