From: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
---|---|
To: | fil(at)internetmediapro(dot)com |
Cc: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: BUG #3638: UTF8 Character encoding does NOT work |
Date: | 2007-09-28 01:26:03 |
Message-ID: | 20070928.102603.04702911.t-ishii@sraoss.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
> Tatsuo Ishii wrote:
> > Why do you think that an UTF-8 encoded string starting with 0x92 is
> > valid?
> >
> > 0x92 can appear in the second, third or fourth octet, but should never
> > appear in the first octet.
> > --
> > Tatsuo Ishii
> > SRA OSS, Inc. Japan
> >
> >
> >> The following bug has been logged online:
> >>
> >> Bug reference: 3638
> >> Logged by: Fil Matthews
> >> Email address: fil(at)internetmediapro(dot)com
> >> PostgreSQL version: 8-1 , 8-2
> >> Operating system: Linux Debian - Windows XP
> >> Description: UTF8 Character encoding does NOT work
> >> Details:
> >>
> >> Judging from the amount of Google page hits with the exact same problem I am
> >> surprised and mystified by this obvious flaw in Postgres Technology..
> >>
> >> Just how is one expected to work with UTF8 character sets when all and
> >> every attempt at using even Postgres clients produces the SAME problem
> >> every time ???
> >>
> >> "invalid byte sequence for encoding "UTF8": 0x92"
> >>
> >> In Short A Postgres UTF8 database .. PGCLIENENCODING=UTF8
> >>
> >> Tables test.text -> (Chararcter varying 10)
> >>
> >> In any Postgres Client ie psql , dbadmin III
> >>
> >> Insert into test values ( chr(146));;
> >>
> >>
> >> Query returned successfully: 1 rows affected, 32 ms execution time.
> >>
> >> copy test to '/tmp/testfile.txt';
> >>
> >>
> >> Query returned successfully: 1 rows affected, 15 ms execution time.
> >>
> >> copy test from '/tmp/testfile.txt';
> >>
> >>
> >> Come on are you serious?? .. Just how does one work with completly valid
> >> data that has an ascii 128 + value ??
> >>
> >> Currently this flaw make Postgres an un-useable database technology .. Or
> >> can some-one please explain this and a possible work around .. ??
> >>
> >> Thank You
> >>
> >> ---------------------------(end of broadcast)---------------------------
> >> TIP 1: if posting/reading through Usenet, please send an appropriate
> >> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> >> message can get through to the mailing list cleanly
> >>
> >
> >
>
> Sorry But I don't agree.. Why can't Postgres store a legitimate 8 bit
> byte value that is below 255?? and treat it as text ..
> Not being able to do this this makes Postgres unusable.. for storing
> TEXT values..
>
> I do not know ANY other database technology that doesn't allow some form
> of storing a legitimate 8 bit byte ...
>
> Even the most simplest open -source database in the world (and most
> popular) can do this..
>
> The biggest and best (Thank you Larry) can do this ...
>
> Postgres can't.
>
> In other words You are claiming that UTF8 is actually UTF7 ....
No.
> There are 8 bits in a byte.. not 7 .. If UTF8 can't by definition
> store 8 bits then what standard can??
UTF-8 does not accept arbitary 8 bit characters. The byte ranges UTF-8
accepts are precisely defined in the standard. If our implementation
is different from it, please let us know.
> The technology is wrong and it is incorrect... If one looks at the
> output of the copy file
> od -c then QUITE correctly the 8 bit value is stored as the value
> given..
>
> What then is the problem in putting this value back in the text field it
> came from ??
PostgreSQL needs to follow the standard. I suggest you double check
the standard.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
From | Date | Subject | |
---|---|---|---|
Next Message | Tatsuo Ishii | 2007-09-28 03:39:51 | Re: BUG #3638: UTF8 Character encoding does NOT work |
Previous Message | Tatsuo Ishii | 2007-09-28 01:23:29 | Re: BUG #3638: UTF8 Character encoding does NOT work |