Re: Bug in UTF8-Validation Code?

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: pgsql(at)markdilger(dot)com
Cc: ishii(at)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us, alvherre(at)commandprompt(dot)com, kleptog(at)svana(dot)org, all(at)adv(dot)magwien(dot)gv(dot)at, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug in UTF8-Validation Code?
Date: 2007-04-04 21:52:06
Message-ID: 20070405.065206.51277520.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Tatsuo Ishii wrote:
>
> > <SNIP>. I think we need to continute design discussion, probably
> > targetting for 8.4, not 8.3.
>
> The discussion came about because Andrew - Supernews noticed that chr()
> returns invalid utf8, and we're trying to fix all the bugs with invalid
> utf8 in the system. Something needs to be done, even if we just check
> the result of the current chr() implementation and throw an error on
> invalid results. But do we want to make this minor change for 8.3 and
> then change it again for 8.4?

My opinion was in the snipped part by you in the previous mail --
Limiting chr() to ASCII range
--
Tatsuo Ishii
SRA OSS, Inc. Japan

> Here's an example of the current problem. It's an 8.2.3 database with
> utf8.en_US encoding
>
>
> mark=# create table testutf8 (t text);
> CREATE TABLE
> mark=# insert into testutf8 (t) (select chr(gs) from
> generate_series(0,255) as gs);
> INSERT 0 256
> mark=# \copy testutf8 to testutf8.data
> mark=# truncate testutf8;
> TRUNCATE TABLE
> mark=# \copy testutf8 from testutf8.data
> ERROR: invalid byte sequence for encoding "UTF8": 0x80
> HINT: This error can also happen if the byte sequence does not match
> the encoding expected by the server, which is controlled by
> "client_encoding".
> CONTEXT: COPY testutf8, line 129
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2007-04-04 23:33:56 Re: absolute interval
Previous Message Nikolay Samokhvalov 2007-04-04 21:22:48 Re: --enable-xml instead of --with-libxml?