Multibyte characters handling bug in varchar()

From: "Edward" <edward(at)lijianghy(dot)com>
To: <pgsql-bugs(at)postgresql(dot)org>
Cc: <t-ishii(at)sra(dot)co(dot)jp>, <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Multibyte characters handling bug in varchar()
Date: 2002-07-10 04:59:54
Message-ID: 001d01c227ce$a28ed0d0$06c6a8c0@ncvillas.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello,

I am using Postgresql 7.1 on Linux platform (RedHat 7.1).

My database encoding is 'EUC_CN'.

The application is accessing database with PG JDBC2.0.

I had define a field in a table like:

create table test1 (

id integer default not null,
memo varchar(128)

);

The memo field is for user to record some comment or alike. They input Chinese (GB2312 or GBK encoding) mixed with ASCII.

Problem happens when:

The length of the input string is larger than 128, and the 128th and 129th byte consists of a Chinese character (you know Chinese characters use two bytes in GB2312 or GBK encoding).

The problem is:

The insert query will be running well without any error. But the getString method will get a zero length String from the field.

More complications:

When I pg_dump the database and restore it, the scripts produced by pg_dump (with -D flag, which means dump with attribute) can not be restored. When I check the scripts I found that the memo field of this record is dumped without the ending single quote (this is because the 128th byte and the single quote followed acutally consists of another unrecognized chinese character) and that is why it failed to be restored.

Below is the dump for this record:

INSERT INTO
"test1" ("id","memo") VALUES (5,'òò×??§í???μ?ê?òàé??óGHμ¥?a??ì????à?÷òaê?5??1è??á3è?òòμúò??ó?a?ì°2??á?ò???D??±1¤?¥ì?á???ìì£?í?μ?ê±????×¢òa?1?é??ê1òμ?÷í???ò?òa?ó?');

I feel the Multibyte is not properly handled in this case. Looking forward to hearing from dev team.

Finally I think PostgreSQL is an excellent database, but the name postgresql seems very difficult to pronounce and it is probably one obstacle preventing people knowing more about it.

Thanks for the hardworking of the dev team, you have done excellent work!

Best Regards,

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tatsuo Ishii 2002-07-10 05:46:36 Re: Multibyte characters handling bug in varchar()
Previous Message Bruce Momjian 2002-07-10 03:08:07 Re: Bug #706: Wrong shlib flag for GCC compilation on Solaris