Tom Lane wrote:
> Note that the reference to byte order betrays the implicit context
> assumption: that we're talking about UTF16 or UTF32 representation.
Note that there is no implicit context assumption in the Unicode FAQ.
It's equally covering UTF-8, UTF-16 and UTF-32.
Q: Can a UTF-8 data stream contain the BOM character (in UTF-8 form)? If
yes, then can I still assume the remaining UTF-8 bytes are in big-endian
A: Yes, UTF-8 can contain a BOM. However, it makes /no/ difference as to
the endianness of the byte stream. UTF-8 always has the same byte order.
An initial BOM is /only/ used as a signature --- an indication that an
otherwise unmarked text file is in UTF-8. Note that some recipients of
UTF-8 encoded data do not expect a BOM. Where UTF-8 is
used/transparently/ in 8-bit environments, the use of a BOM will
interfere with any protocol or file format that expects specific ASCII
characters at the beginning, such as the use of "#!" of at the beginning
of Unix shell scripts.
> BOM is useless in UTF8, no matter what Microsoft thinks. Any tool that
> relies on it to detect UTF8 data has to have a workaround for overriding
> that detection, or it's broken to the point of uselessness.
This kind of brokenness is currently existing the other way around (see
my reference to the perl script I' using to work aound it).
Note also that I'm not citing a Microsoft FAQ but the Unicode FAQ.
I'm also not trying to convert Postgres into a Microsoft tool (I'm
pretty happy it isn't) but I'm pointing to existing compatibility issues
on a Platform that others have decided to support.
Belonging to the huge group of users who have little or no choice in
what OS they are using and being from a country where plain ASCII isn't
enough to cover all existing characters this is probably fair.
It's a pity that the Unicode standard actually allows something that can
cause problems but blaming the non-platform again doesn't solve the
In response to
pgsql-hackers by date
|Next:||From: Fujii Masao||Date: 2011-09-27 05:51:38|
|Subject: Re: Online base backup from the hot-standby|
|Previous:||From: Fujii Masao||Date: 2011-09-27 05:00:14|
|Subject: Re: bug of recovery?|