Re: Multibyte characters handling bug in varchar()

From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: edward(at)lijianghy(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject: Re: Multibyte characters handling bug in varchar()
Date: 2002-07-10 05:46:36
Message-ID: 20020710.144636.71574064.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I suspect that data stored in your database was not EUC_CN. GB2312(or
GBK) is different from EUC_CN. Can you show me the INSERT statement
text in gzipped form to prevent accidental changes while relaying the
mail chanin?
--
Tatsuo Ishii

> Hello,
>
> I am using Postgresql 7.1 on Linux platform (RedHat 7.1).
>
> My database encoding is 'EUC_CN'.
>
> The application is accessing database with PG JDBC2.0.
>
> I had define a field in a table like:
>
> create table test1 (
>
> id integer default not null,
> memo varchar(128)
>
>
> );
>
> The memo field is for user to record some comment or alike. They input Chinese (GB2312 or GBK encoding) mixed with ASCII.
>
> Problem happens when:
>
> The length of the input string is larger than 128, and the 128th and 129th byte consists of a Chinese character (you know Chinese characters use two bytes in GB2312 or GBK encoding).
>
> The problem is:
>
> The insert query will be running well without any error. But the getString method will get a zero length String from the field.
>
> More complications:
>
> When I pg_dump the database and restore it, the scripts produced by pg_dump (with -D flag, which means dump with attribute) can not be restored. When I check the scripts I found that the memo field of this record is dumped without the ending single quote (this is because the 128th byte and the single quote followed acutally consists of another unrecognized chinese character) and that is why it failed to be restored.
>
> Below is the dump for this record:
>
> INSERT INTO
> "test1" ("id","memo") VALUES (5,'òò×??§í???μ?ê?òàé??óGHμ¥?a??ì????à?÷òaê?5??1è??á3è?òòμúò??ó?a?ì°2??á?ò???D??±1¤?¥ì?á???ìì£?í?μ?ê±????×¢òa?1?é??ê1òμ?÷í???ò?òa?ó?');
>
>
> I feel the Multibyte is not properly handled in this case. Looking forward to hearing from dev team.
>
> Finally I think PostgreSQL is an excellent database, but the name postgresql seems very difficult to pronounce and it is probably one obstacle preventing people knowing more about it.
>
> Thanks for the hardworking of the dev team, you have done excellent work!
>
> Best Regards,
>
>
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Richard Cook x261 2002-07-10 17:18:06 Re: Bug #706: Wrong shlib flag for GCC compilation on Solaris
Previous Message Edward 2002-07-10 04:59:54 Multibyte characters handling bug in varchar()