Microsoft harmful extensions to 8859-X charsets (was: Continuing encoding fun....)

From: Marc Herbert <Marc(dot)Herbert(at)continuent(dot)com>
To: pgsql-odbc(at)postgresql(dot)org
Subject: Microsoft harmful extensions to 8859-X charsets (was: Continuing encoding fun....)
Date: 2005-11-24 14:10:23
Message-ID: 878xveyw4w.fsf@meije.emic.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-odbc

"Dave Page" <dpage(at)vale-housing(dot)co(dot)uk> writes:

>> By the way 0x8A is not in the range of latin4
>> <http://czyborra.com/charsets/iso8859.html#ISO-8859-4>
>
> http://www.gar.no/home/mats/8859-4.htm says differently, however, I
> can't claim to know enough about encoding issues to refute
> either. I've been forced to learn what I can about the subject to help
> maintain this driver and certainly may have got the wrong end of the
> stick on one or more points!

The page from gar.no is just a dump of the *Microsoft-extended* latin4
charset.

The standards comittee carefully left a gap in all LATIN-X charsets
between 0x80 and 0x9F, because those characters become (harmful)
control characters once stripped of their 8th bit (by accident).
You can see that very clearly in this table for instance
<http://en.wikipedia.org/wiki/ISO_8859-4>

If you follow the links from gar.no itself, you can land here:
<http://en.wikipedia.org/wiki/ISO_8859> with tons of links (like the
ECMA standards for instance) showing this gap.

Microsoft, being Microsoft, jumped in that gap. Those non-standard
Microsoft characters now plague the web as clearly explained here:

<http://home.earthlink.net/~bobbau/platforms/specialchars/#windows>
or here:
<http://www.cs.tut.fi/~jkorpela/www/windows-chars.html>

In response to

Browse pgsql-odbc by date

  From Date Subject
Next Message Marc Herbert 2005-11-24 14:18:23 Re: Continuing encoding fun....
Previous Message Dave Page 2005-11-24 13:39:56 Re: Postgresql odbc and Visual studio 2005 .net 2.0