Codepage Win1252

From: Jörg Schulz <jschulz(at)sgbs(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: Codepage Win1252
Date: 2003-09-12 07:55:22
Message-ID: bjru5n$7uk$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I am missing this codepage quiet some time, but I
was able to patch another unneeded mapping to my needs.
Unfortunately I wasn't able to add a complete new mapping.
Maybe someone of you can do this better... :-)

I added some tiny scripts that generate at least the
needed mappings in the src/backend/utils/mb/Unicode/*.map files.

Hope this helps to get PostgreSQL support more codepages.

Jörg

jschulz(at)opal:~/programme/postgresql/pgmaps> cat README

Do a copy and paste from a codepage reference under
http://www.microsoft.com/globaldev/reference/cphome.mspx

For example win1252 was copied from
http://www.microsoft.com/globaldev/reference/sbcs/1252.htm

then type e.g. make_pgmaps win1252 ...

jschulz(at)opal:~/programme/postgresql/pgmaps> cat make_pgmaps
#!/bin/bash

for f in $*; do
echo -e "${f}: ${f}_to_utf8.map...\c"
./codepage_to_utf8 ${f} > ${f}_to_utf8.map
echo -e "ok utf8_to_${f}.map...\c"
./utf8_to_codepage ${f} > utf8_to_${f}.map
echo "ok"
done

jschulz(at)opal:~/programme/postgresql/pgmaps> cat codepage_to_utf8
#!/bin/bash

while read l;
do
cp=`echo "$l" | cut -c1-2`
u16=`echo "$l" | cut -c8-11`
u8=`echo "0x$u16" | recode utf-16/x4..utf-8/x4`
echo " {0x00$cp, $u8},"
done < $1 | awk '{print tolower($0)}'

jschulz(at)opal:~/programme/postgresql/pgmaps> cat utf8_to_codepage
#!/bin/bash

while read l;
do
cp=`echo "$l" | cut -c1-2`
u16=`echo "$l" | cut -c8-11`
u8=`echo "0x$u16" | recode utf-16/x4..utf-8/x4`
echo " {$u8, 0x00$cp},"
done < $1 | awk '{print tolower($0)}' | sort

jschulz(at)opal:~/programme/postgresql/pgmaps> cat win1252
80 = U+20AC : EURO SIGN
82 = U+201A : SINGLE LOW-9 QUOTATION MARK
83 = U+0192 : LATIN SMALL LETTER F WITH HOOK
84 = U+201E : DOUBLE LOW-9 QUOTATION MARK
85 = U+2026 : HORIZONTAL ELLIPSIS
86 = U+2020 : DAGGER
87 = U+2021 : DOUBLE DAGGER
88 = U+02C6 : MODIFIER LETTER CIRCUMFLEX ACCENT
89 = U+2030 : PER MILLE SIGN
8A = U+0160 : LATIN CAPITAL LETTER S WITH CARON
8B = U+2039 : SINGLE LEFT-POINTING ANGLE QUOTATION MARK
8C = U+0152 : LATIN CAPITAL LIGATURE OE
8E = U+017D : LATIN CAPITAL LETTER Z WITH CARON
91 = U+2018 : LEFT SINGLE QUOTATION MARK
92 = U+2019 : RIGHT SINGLE QUOTATION MARK
93 = U+201C : LEFT DOUBLE QUOTATION MARK
94 = U+201D : RIGHT DOUBLE QUOTATION MARK
95 = U+2022 : BULLET
96 = U+2013 : EN DASH
97 = U+2014 : EM DASH
98 = U+02DC : SMALL TILDE
99 = U+2122 : TRADE MARK SIGN
9A = U+0161 : LATIN SMALL LETTER S WITH CARON
9B = U+203A : SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
9C = U+0153 : LATIN SMALL LIGATURE OE
9E = U+017E : LATIN SMALL LETTER Z WITH CARON
9F = U+0178 : LATIN CAPITAL LETTER Y WITH DIAERESIS
A0 = U+00A0 : NO-BREAK SPACE
A1 = U+00A1 : INVERTED EXCLAMATION MARK
A2 = U+00A2 : CENT SIGN
A3 = U+00A3 : POUND SIGN
A4 = U+00A4 : CURRENCY SIGN
A5 = U+00A5 : YEN SIGN
A6 = U+00A6 : BROKEN BAR
A7 = U+00A7 : SECTION SIGN
A8 = U+00A8 : DIAERESIS
A9 = U+00A9 : COPYRIGHT SIGN
AA = U+00AA : FEMININE ORDINAL INDICATOR
AB = U+00AB : LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
AC = U+00AC : NOT SIGN
AD = U+00AD : SOFT HYPHEN
AE = U+00AE : REGISTERED SIGN
AF = U+00AF : MACRON
B0 = U+00B0 : DEGREE SIGN
B1 = U+00B1 : PLUS-MINUS SIGN
B2 = U+00B2 : SUPERSCRIPT TWO
B3 = U+00B3 : SUPERSCRIPT THREE
B4 = U+00B4 : ACUTE ACCENT
B5 = U+00B5 : MICRO SIGN
B6 = U+00B6 : PILCROW SIGN
B7 = U+00B7 : MIDDLE DOT
B8 = U+00B8 : CEDILLA
B9 = U+00B9 : SUPERSCRIPT ONE
BA = U+00BA : MASCULINE ORDINAL INDICATOR
BB = U+00BB : RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
BC = U+00BC : VULGAR FRACTION ONE QUARTER
BD = U+00BD : VULGAR FRACTION ONE HALF
BE = U+00BE : VULGAR FRACTION THREE QUARTERS
BF = U+00BF : INVERTED QUESTION MARK
C0 = U+00C0 : LATIN CAPITAL LETTER A WITH GRAVE
C1 = U+00C1 : LATIN CAPITAL LETTER A WITH ACUTE
C2 = U+00C2 : LATIN CAPITAL LETTER A WITH CIRCUMFLEX
C3 = U+00C3 : LATIN CAPITAL LETTER A WITH TILDE
C4 = U+00C4 : LATIN CAPITAL LETTER A WITH DIAERESIS
C5 = U+00C5 : LATIN CAPITAL LETTER A WITH RING ABOVE
C6 = U+00C6 : LATIN CAPITAL LETTER AE
C7 = U+00C7 : LATIN CAPITAL LETTER C WITH CEDILLA
C8 = U+00C8 : LATIN CAPITAL LETTER E WITH GRAVE
C9 = U+00C9 : LATIN CAPITAL LETTER E WITH ACUTE
CA = U+00CA : LATIN CAPITAL LETTER E WITH CIRCUMFLEX
CB = U+00CB : LATIN CAPITAL LETTER E WITH DIAERESIS
CC = U+00CC : LATIN CAPITAL LETTER I WITH GRAVE
CD = U+00CD : LATIN CAPITAL LETTER I WITH ACUTE
CE = U+00CE : LATIN CAPITAL LETTER I WITH CIRCUMFLEX
CF = U+00CF : LATIN CAPITAL LETTER I WITH DIAERESIS
D0 = U+00D0 : LATIN CAPITAL LETTER ETH
D1 = U+00D1 : LATIN CAPITAL LETTER N WITH TILDE
D2 = U+00D2 : LATIN CAPITAL LETTER O WITH GRAVE
D3 = U+00D3 : LATIN CAPITAL LETTER O WITH ACUTE
D4 = U+00D4 : LATIN CAPITAL LETTER O WITH CIRCUMFLEX
D5 = U+00D5 : LATIN CAPITAL LETTER O WITH TILDE
D6 = U+00D6 : LATIN CAPITAL LETTER O WITH DIAERESIS
D7 = U+00D7 : MULTIPLICATION SIGN
D8 = U+00D8 : LATIN CAPITAL LETTER O WITH STROKE
D9 = U+00D9 : LATIN CAPITAL LETTER U WITH GRAVE
DA = U+00DA : LATIN CAPITAL LETTER U WITH ACUTE
DB = U+00DB : LATIN CAPITAL LETTER U WITH CIRCUMFLEX
DC = U+00DC : LATIN CAPITAL LETTER U WITH DIAERESIS
DD = U+00DD : LATIN CAPITAL LETTER Y WITH ACUTE
DE = U+00DE : LATIN CAPITAL LETTER THORN
DF = U+00DF : LATIN SMALL LETTER SHARP S
E0 = U+00E0 : LATIN SMALL LETTER A WITH GRAVE
E1 = U+00E1 : LATIN SMALL LETTER A WITH ACUTE
E2 = U+00E2 : LATIN SMALL LETTER A WITH CIRCUMFLEX
E3 = U+00E3 : LATIN SMALL LETTER A WITH TILDE
E4 = U+00E4 : LATIN SMALL LETTER A WITH DIAERESIS
E5 = U+00E5 : LATIN SMALL LETTER A WITH RING ABOVE
E6 = U+00E6 : LATIN SMALL LETTER AE
E7 = U+00E7 : LATIN SMALL LETTER C WITH CEDILLA
E8 = U+00E8 : LATIN SMALL LETTER E WITH GRAVE
E9 = U+00E9 : LATIN SMALL LETTER E WITH ACUTE
EA = U+00EA : LATIN SMALL LETTER E WITH CIRCUMFLEX
EB = U+00EB : LATIN SMALL LETTER E WITH DIAERESIS
EC = U+00EC : LATIN SMALL LETTER I WITH GRAVE
ED = U+00ED : LATIN SMALL LETTER I WITH ACUTE
EE = U+00EE : LATIN SMALL LETTER I WITH CIRCUMFLEX
EF = U+00EF : LATIN SMALL LETTER I WITH DIAERESIS
F0 = U+00F0 : LATIN SMALL LETTER ETH
F1 = U+00F1 : LATIN SMALL LETTER N WITH TILDE
F2 = U+00F2 : LATIN SMALL LETTER O WITH GRAVE
F3 = U+00F3 : LATIN SMALL LETTER O WITH ACUTE
F4 = U+00F4 : LATIN SMALL LETTER O WITH CIRCUMFLEX
F5 = U+00F5 : LATIN SMALL LETTER O WITH TILDE
F6 = U+00F6 : LATIN SMALL LETTER O WITH DIAERESIS
F7 = U+00F7 : DIVISION SIGN
F8 = U+00F8 : LATIN SMALL LETTER O WITH STROKE
F9 = U+00F9 : LATIN SMALL LETTER U WITH GRAVE
FA = U+00FA : LATIN SMALL LETTER U WITH ACUTE
FB = U+00FB : LATIN SMALL LETTER U WITH CIRCUMFLEX
FC = U+00FC : LATIN SMALL LETTER U WITH DIAERESIS
FD = U+00FD : LATIN SMALL LETTER Y WITH ACUTE
FE = U+00FE : LATIN SMALL LETTER THORN
FF = U+00FF : LATIN SMALL LETTER Y WITH DIAERESIS

jschulz(at)opal:~/programme/postgresql/pgmaps> cat utf8_to_win1252.map
{0xc2a0, 0x00a0},
{0xc2a1, 0x00a1},
{0xc2a2, 0x00a2},
{0xc2a3, 0x00a3},
{0xc2a4, 0x00a4},
{0xc2a5, 0x00a5},
{0xc2a6, 0x00a6},
{0xc2a7, 0x00a7},
{0xc2a8, 0x00a8},
{0xc2a9, 0x00a9},
{0xc2aa, 0x00aa},
{0xc2ab, 0x00ab},
{0xc2ac, 0x00ac},
{0xc2ad, 0x00ad},
{0xc2ae, 0x00ae},
{0xc2af, 0x00af},
{0xc2b0, 0x00b0},
{0xc2b1, 0x00b1},
{0xc2b2, 0x00b2},
{0xc2b3, 0x00b3},
{0xc2b4, 0x00b4},
{0xc2b5, 0x00b5},
{0xc2b6, 0x00b6},
{0xc2b7, 0x00b7},
{0xc2b8, 0x00b8},
{0xc2b9, 0x00b9},
{0xc2ba, 0x00ba},
{0xc2bb, 0x00bb},
{0xc2bc, 0x00bc},
{0xc2bd, 0x00bd},
{0xc2be, 0x00be},
{0xc2bf, 0x00bf},
{0xc380, 0x00c0},
{0xc381, 0x00c1},
{0xc382, 0x00c2},
{0xc383, 0x00c3},
{0xc384, 0x00c4},
{0xc385, 0x00c5},
{0xc386, 0x00c6},
{0xc387, 0x00c7},
{0xc388, 0x00c8},
{0xc389, 0x00c9},
{0xc38a, 0x00ca},
{0xc38b, 0x00cb},
{0xc38c, 0x00cc},
{0xc38d, 0x00cd},
{0xc38e, 0x00ce},
{0xc38f, 0x00cf},
{0xc390, 0x00d0},
{0xc391, 0x00d1},
{0xc392, 0x00d2},
{0xc393, 0x00d3},
{0xc394, 0x00d4},
{0xc395, 0x00d5},
{0xc396, 0x00d6},
{0xc397, 0x00d7},
{0xc398, 0x00d8},
{0xc399, 0x00d9},
{0xc39a, 0x00da},
{0xc39b, 0x00db},
{0xc39c, 0x00dc},
{0xc39d, 0x00dd},
{0xc39e, 0x00de},
{0xc39f, 0x00df},
{0xc3a0, 0x00e0},
{0xc3a1, 0x00e1},
{0xc3a2, 0x00e2},
{0xc3a3, 0x00e3},
{0xc3a4, 0x00e4},
{0xc3a5, 0x00e5},
{0xc3a6, 0x00e6},
{0xc3a7, 0x00e7},
{0xc3a8, 0x00e8},
{0xc3a9, 0x00e9},
{0xc3aa, 0x00ea},
{0xc3ab, 0x00eb},
{0xc3ac, 0x00ec},
{0xc3ad, 0x00ed},
{0xc3ae, 0x00ee},
{0xc3af, 0x00ef},
{0xc3b0, 0x00f0},
{0xc3b1, 0x00f1},
{0xc3b2, 0x00f2},
{0xc3b3, 0x00f3},
{0xc3b4, 0x00f4},
{0xc3b5, 0x00f5},
{0xc3b6, 0x00f6},
{0xc3b7, 0x00f7},
{0xc3b8, 0x00f8},
{0xc3b9, 0x00f9},
{0xc3ba, 0x00fa},
{0xc3bb, 0x00fb},
{0xc3bc, 0x00fc},
{0xc3bd, 0x00fd},
{0xc3be, 0x00fe},
{0xc3bf, 0x00ff},
{0xc592, 0x008c},
{0xc593, 0x009c},
{0xc5a0, 0x008a},
{0xc5a1, 0x009a},
{0xc5b8, 0x009f},
{0xc5bd, 0x008e},
{0xc5be, 0x009e},
{0xc692, 0x0083},
{0xcb86, 0x0088},
{0xcb9c, 0x0098},
{0xe28093, 0x0096},
{0xe28094, 0x0097},
{0xe28098, 0x0091},
{0xe28099, 0x0092},
{0xe2809a, 0x0082},
{0xe2809c, 0x0093},
{0xe2809d, 0x0094},
{0xe2809e, 0x0084},
{0xe280a0, 0x0086},
{0xe280a1, 0x0087},
{0xe280a2, 0x0095},
{0xe280a6, 0x0085},
{0xe280b0, 0x0089},
{0xe280b9, 0x008b},
{0xe280ba, 0x009b},
{0xe282ac, 0x0080},
{0xe284a2, 0x0099},

jschulz(at)opal:~/programme/postgresql/pgmaps> cat win1252_to_utf8.map
{0x0080, 0xe282ac},
{0x0082, 0xe2809a},
{0x0083, 0xc692},
{0x0084, 0xe2809e},
{0x0085, 0xe280a6},
{0x0086, 0xe280a0},
{0x0087, 0xe280a1},
{0x0088, 0xcb86},
{0x0089, 0xe280b0},
{0x008a, 0xc5a0},
{0x008b, 0xe280b9},
{0x008c, 0xc592},
{0x008e, 0xc5bd},
{0x0091, 0xe28098},
{0x0092, 0xe28099},
{0x0093, 0xe2809c},
{0x0094, 0xe2809d},
{0x0095, 0xe280a2},
{0x0096, 0xe28093},
{0x0097, 0xe28094},
{0x0098, 0xcb9c},
{0x0099, 0xe284a2},
{0x009a, 0xc5a1},
{0x009b, 0xe280ba},
{0x009c, 0xc593},
{0x009e, 0xc5be},
{0x009f, 0xc5b8},
{0x00a0, 0xc2a0},
{0x00a1, 0xc2a1},
{0x00a2, 0xc2a2},
{0x00a3, 0xc2a3},
{0x00a4, 0xc2a4},
{0x00a5, 0xc2a5},
{0x00a6, 0xc2a6},
{0x00a7, 0xc2a7},
{0x00a8, 0xc2a8},
{0x00a9, 0xc2a9},
{0x00aa, 0xc2aa},
{0x00ab, 0xc2ab},
{0x00ac, 0xc2ac},
{0x00ad, 0xc2ad},
{0x00ae, 0xc2ae},
{0x00af, 0xc2af},
{0x00b0, 0xc2b0},
{0x00b1, 0xc2b1},
{0x00b2, 0xc2b2},
{0x00b3, 0xc2b3},
{0x00b4, 0xc2b4},
{0x00b5, 0xc2b5},
{0x00b6, 0xc2b6},
{0x00b7, 0xc2b7},
{0x00b8, 0xc2b8},
{0x00b9, 0xc2b9},
{0x00ba, 0xc2ba},
{0x00bb, 0xc2bb},
{0x00bc, 0xc2bc},
{0x00bd, 0xc2bd},
{0x00be, 0xc2be},
{0x00bf, 0xc2bf},
{0x00c0, 0xc380},
{0x00c1, 0xc381},
{0x00c2, 0xc382},
{0x00c3, 0xc383},
{0x00c4, 0xc384},
{0x00c5, 0xc385},
{0x00c6, 0xc386},
{0x00c7, 0xc387},
{0x00c8, 0xc388},
{0x00c9, 0xc389},
{0x00ca, 0xc38a},
{0x00cb, 0xc38b},
{0x00cc, 0xc38c},
{0x00cd, 0xc38d},
{0x00ce, 0xc38e},
{0x00cf, 0xc38f},
{0x00d0, 0xc390},
{0x00d1, 0xc391},
{0x00d2, 0xc392},
{0x00d3, 0xc393},
{0x00d4, 0xc394},
{0x00d5, 0xc395},
{0x00d6, 0xc396},
{0x00d7, 0xc397},
{0x00d8, 0xc398},
{0x00d9, 0xc399},
{0x00da, 0xc39a},
{0x00db, 0xc39b},
{0x00dc, 0xc39c},
{0x00dd, 0xc39d},
{0x00de, 0xc39e},
{0x00df, 0xc39f},
{0x00e0, 0xc3a0},
{0x00e1, 0xc3a1},
{0x00e2, 0xc3a2},
{0x00e3, 0xc3a3},
{0x00e4, 0xc3a4},
{0x00e5, 0xc3a5},
{0x00e6, 0xc3a6},
{0x00e7, 0xc3a7},
{0x00e8, 0xc3a8},
{0x00e9, 0xc3a9},
{0x00ea, 0xc3aa},
{0x00eb, 0xc3ab},
{0x00ec, 0xc3ac},
{0x00ed, 0xc3ad},
{0x00ee, 0xc3ae},
{0x00ef, 0xc3af},
{0x00f0, 0xc3b0},
{0x00f1, 0xc3b1},
{0x00f2, 0xc3b2},
{0x00f3, 0xc3b3},
{0x00f4, 0xc3b4},
{0x00f5, 0xc3b5},
{0x00f6, 0xc3b6},
{0x00f7, 0xc3b7},
{0x00f8, 0xc3b8},
{0x00f9, 0xc3b9},
{0x00fa, 0xc3ba},
{0x00fb, 0xc3bb},
{0x00fc, 0xc3bc},
{0x00fd, 0xc3bd},
{0x00fe, 0xc3be},
{0x00ff, 0xc3bf},

Browse pgsql-general by date

  From Date Subject
Next Message Amin Schoeib 2003-09-12 09:54:40 Converting database-encoding from SQL_ASCII to UNICODE?????
Previous Message Marek Lewczuk 2003-09-12 07:50:06 plPHP for windows