Re: BUG #3638: UTF8 Character encoding does NOT work

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: fil(at)internetmediapro(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #3638: UTF8 Character encoding does NOT work
Date: 2007-09-28 01:26:03
Message-ID: 20070928.102603.04702911.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> Tatsuo Ishii wrote:
> > Why do you think that an UTF-8 encoded string starting with 0x92 is
> > valid?
> >
> > 0x92 can appear in the second, third or fourth octet, but should never
> > appear in the first octet.
> > --
> > Tatsuo Ishii
> > SRA OSS, Inc. Japan
> >
> >
> >> The following bug has been logged online:
> >>
> >> Bug reference: 3638
> >> Logged by: Fil Matthews
> >> Email address: fil(at)internetmediapro(dot)com
> >> PostgreSQL version: 8-1 , 8-2
> >> Operating system: Linux Debian - Windows XP
> >> Description: UTF8 Character encoding does NOT work
> >> Details:
> >>
> >> Judging from the amount of Google page hits with the exact same problem I am
> >> surprised and mystified by this obvious flaw in Postgres Technology..
> >>
> >> Just how is one expected to work with UTF8 character sets when all and
> >> every attempt at using even Postgres clients produces the SAME problem
> >> every time ???
> >>
> >> "invalid byte sequence for encoding "UTF8": 0x92"
> >>
> >> In Short A Postgres UTF8 database .. PGCLIENENCODING=UTF8
> >>
> >> Tables test.text -> (Chararcter varying 10)
> >>
> >> In any Postgres Client ie psql , dbadmin III
> >>
> >> Insert into test values ( chr(146));;
> >>
> >>
> >> Query returned successfully: 1 rows affected, 32 ms execution time.
> >>
> >> copy test to '/tmp/testfile.txt';
> >>
> >>
> >> Query returned successfully: 1 rows affected, 15 ms execution time.
> >>
> >> copy test from '/tmp/testfile.txt';
> >>
> >>
> >> Come on are you serious?? .. Just how does one work with completly valid
> >> data that has an ascii 128 + value ??
> >>
> >> Currently this flaw make Postgres an un-useable database technology .. Or
> >> can some-one please explain this and a possible work around .. ??
> >>
> >> Thank You
> >>
> >> ---------------------------(end of broadcast)---------------------------
> >> TIP 1: if posting/reading through Usenet, please send an appropriate
> >> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> >> message can get through to the mailing list cleanly
> >>
> >
> >
>
> Sorry But I don't agree.. Why can't Postgres store a legitimate 8 bit
> byte value that is below 255?? and treat it as text ..
> Not being able to do this this makes Postgres unusable.. for storing
> TEXT values..
>
> I do not know ANY other database technology that doesn't allow some form
> of storing a legitimate 8 bit byte ...
>
> Even the most simplest open -source database in the world (and most
> popular) can do this..
>
> The biggest and best (Thank you Larry) can do this ...
>
> Postgres can't.
>
> In other words You are claiming that UTF8 is actually UTF7 ....

No.

> There are 8 bits in a byte.. not 7 .. If UTF8 can't by definition
> store 8 bits then what standard can??

UTF-8 does not accept arbitary 8 bit characters. The byte ranges UTF-8
accepts are precisely defined in the standard. If our implementation
is different from it, please let us know.

> The technology is wrong and it is incorrect... If one looks at the
> output of the copy file
> od -c then QUITE correctly the 8 bit value is stored as the value
> given..
>
> What then is the problem in putting this value back in the text field it
> came from ??

PostgreSQL needs to follow the standard. I suggest you double check
the standard.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tatsuo Ishii 2007-09-28 03:39:51 Re: BUG #3638: UTF8 Character encoding does NOT work
Previous Message Tatsuo Ishii 2007-09-28 01:23:29 Re: BUG #3638: UTF8 Character encoding does NOT work