Re: [HACKERS] Questions on using multi-byte character in a field of a table (BIG5)

From: t-ishii(at)sra(dot)co(dot)jp (Tatsuo Ishii)
To: jacky_hui(at)geocities(dot)com, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Questions on using multi-byte character in a field of a table (BIG5)
Date: 1998-11-23 14:27:04
Message-ID: 199811231429.XAA10436@meshsv26.tk.mesh.ad.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At 3:46 AM 98.11.22 +0800, Hui Chun Kit, Jacky wrote:
>Dear all,
>
> I have some difficult time in using postgresql 6.4 with chinese BIG5
>
>characters. I am just looking for storing BIG characters in a text field
>
>and retrieve correctly. I have --enable-mb when I compile. I am on RH5.1

What did you choose for an encoding?
BIG5 is not supported yet in 6.4, sorry.

>intel platform, running PG 6.4.
> I just created a testing table test
> create test ( name char(20), age int);
> For most of the characters in BIG5, it works and I can insert
>chinese name into the table, but for some characters, esp my own name,
>it does not work. I have check the problem out . But cannot solve it.
> It is because in my name under BIG5 coding it is "5cb3 54ab c7b3"
>or
>in ASCII code "263 \ 253 T 263 307" where two byte is a character.
>That is "5cb3" ('263' '\' ) is the first character and '54ab' ( '253'
>'T' ) becomes the second character. The problem is that somewhere
>between storing the value into database and client frontend (Perl,
>MSAccess) , the '\' is interpreted and thus the stored value becomes
>"263 253 T 263 307" which is distorted.
> I don't know where exactly is the problem as when I use Mysql, it is
>
>working fine.

As you can see the problem is that BIG5 can contain some special characters
in the second byte that confuse the PostgreSQL parser. We had similar
experience with Japanese Shift Jis Code (SJIS). To address the problem
we have added a fuctionality to convert between SJIS and EUC_JP (that never
confuses the parser thus can be used as one of backend native encoding)
somewhere in the backend.

To solve your problem, there might be 2 solutions:

o Use EUC_TW(Chinese EUC Code) instead of BIG5. 6.4 should be happy
with EUC_TW. To use EUC_TW, just create a new database:
createdb mydb with encoding='EUC_TW'.
or do "configure --with-mb=EUC_TW" and re-install. then re-create
the database.

Alternatively, you can use Unicode (UTF-8). Use "UNICODE" instead of
"EUC_TW" in this case.

o Add an encoding conversion module between BIG5 and EUC_TW to PostgreSQL.
I wish I could do that, but I have no idea how to write it
(I don't speak Chinese at all). So your contribution would be welcome!

BTW, you said you use perl. I'm surprised to hear that perl
can handle BIG5. Is it a modified version (localized version)?

You also use M$Access. So you must use ODBC, that make me worry about its
support for BIG5. Here in Japan we are using localized version of
ODBC driver that supports SJIS.

What I want to say here is that your problem may not be ony PostgreSQL
itself. I recommend you make sure that your clients can handle
BIG5.
--
Tatsuo Ishii
t-ishii(at)sra(dot)co(dot)jp

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Hollomon 1998-11-23 14:34:28 Re: [HACKERS] Tree type, how best to impliment?
Previous Message The Hermit Hacker 1998-11-23 13:36:43 CVS problem ... fixed...