Re: [HACKERS] Questions on using multi-byte character in a field of a table (BIG5)

From: Jacky Hui Chun Kit <ckhui(at)school(dot)net(dot)hk>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Questions on using multi-byte character in a field of a table (BIG5)
Date: 1998-11-24 04:04:24
Message-ID: 365A3048.BDBCD7A@school.net.hk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear ,

Really thanks for your reply... I have been waiting for reply for a while.
I realy want to help out with this but I have some problems.
1. I am not familiar with ODBC Standards and internal.
2. I am not familiar with Language Coding and Convertion.

But I do used to programming in C, C++, Perl and both under UNIX and VC5
Maybe we can cooperate with some other East Asian Countries (Korean,
Taiwan) to create customized ODBC driver for each language coding we have.
Besides, perl do work with Chinese, in fact, I only have problem with ODBC
now. When I use bind variables in DBD:Pg, all things work. I think this is
because when assigning variables in perl using single quote instead of double
quote $var='sth'; would prevent perl from interpreting the value of the
variable and thus everything works. Of course, I am using EUC_TW as my default
encoding during initdb and createdb.
Can u tell me where can I find more info on language coding and writing
ODBC dirver. I have read the source of the PsqlODBC and I think they are using
Crygus GNU toolset. Can u tell me more about what you guys have done.
Thanks.

Best Rgds,
Jacky Hui

Tatsuo Ishii wrote:

> At 3:46 AM 98.11.22 +0800, Hui Chun Kit, Jacky wrote:
> >Dear all,
> >
> > I have some difficult time in using postgresql 6.4 with chinese BIG5
> >
> >characters. I am just looking for storing BIG characters in a text field
> >
> >and retrieve correctly. I have --enable-mb when I compile. I am on RH5.1
>
> What did you choose for an encoding?
> BIG5 is not supported yet in 6.4, sorry.
>
> >intel platform, running PG 6.4.
> > I just created a testing table test
> > create test ( name char(20), age int);
> > For most of the characters in BIG5, it works and I can insert
> >chinese name into the table, but for some characters, esp my own name,
> >it does not work. I have check the problem out . But cannot solve it.
> > It is because in my name under BIG5 coding it is "5cb3 54ab c7b3"
> >or
> >in ASCII code "263 \ 253 T 263 307" where two byte is a character.
> >That is "5cb3" ('263' '\' ) is the first character and '54ab' ( '253'
> >'T' ) becomes the second character. The problem is that somewhere
> >between storing the value into database and client frontend (Perl,
> >MSAccess) , the '\' is interpreted and thus the stored value becomes
> >"263 253 T 263 307" which is distorted.
> > I don't know where exactly is the problem as when I use Mysql, it is
> >
> >working fine.
>
> As you can see the problem is that BIG5 can contain some special characters
> in the second byte that confuse the PostgreSQL parser. We had similar
> experience with Japanese Shift Jis Code (SJIS). To address the problem
> we have added a fuctionality to convert between SJIS and EUC_JP (that never
> confuses the parser thus can be used as one of backend native encoding)
> somewhere in the backend.
>
> To solve your problem, there might be 2 solutions:
>
> o Use EUC_TW(Chinese EUC Code) instead of BIG5. 6.4 should be happy
> with EUC_TW. To use EUC_TW, just create a new database:
> createdb mydb with encoding='EUC_TW'.
> or do "configure --with-mb=EUC_TW" and re-install. then re-create
> the database.
>
> Alternatively, you can use Unicode (UTF-8). Use "UNICODE" instead of
> "EUC_TW" in this case.
>
> o Add an encoding conversion module between BIG5 and EUC_TW to PostgreSQL.
> I wish I could do that, but I have no idea how to write it
> (I don't speak Chinese at all). So your contribution would be welcome!
>
> BTW, you said you use perl. I'm surprised to hear that perl
> can handle BIG5. Is it a modified version (localized version)?
>
> You also use M$Access. So you must use ODBC, that make me worry about its
> support for BIG5. Here in Japan we are using localized version of
> ODBC driver that supports SJIS.
>
> What I want to say here is that your problem may not be ony PostgreSQL
> itself. I recommend you make sure that your clients can handle
> BIG5.
> --
> Tatsuo Ishii
> t-ishii(at)sra(dot)co(dot)jp

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 1998-11-24 04:04:39 Re: [HACKERS] Bug in 6.4 release
Previous Message Vadim Mikheev 1998-11-24 03:53:53 Re: [HACKERS] Bug in 6.4 release