Re: MSSQL to PostgreSQL : Encoding problem

From: "Brandon Aiken" <BAiken(at)winemantech(dot)com>
To: <thewild(at)free(dot)fr>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: MSSQL to PostgreSQL : Encoding problem
Date: 2006-11-22 18:55:55
Message-ID: F8E84F0F56445B4CB39E019EF67DACBA3C4BBD@exchsrvr.winemantech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

It also might be a big/little endian problem, although I always thought that was platform specific, not locale specific.

Try the UCS-2-INTERNAL and UCS-4-INTERNAL codepages in iconv, which should use the two-byte or four-byte versions of UCS encoding using the system's default endian setting.

There's many Unicode codepage formats that iconv supports:
UTF-8
ISO-10646-UCS-2 UCS-2 CSUNICODE
UCS-2BE UNICODE-1-1 UNICODEBIG CSUNICODE11
UCS-2LE UNICODELITTLE
ISO-10646-UCS-4 UCS-4 CSUCS4
UCS-4BE
UCS-4LE
UTF-16
UTF-16BE
UTF-16LE
UTF-32
UTF-32BE
UTF-32LE
UNICODE-1-1-UTF-7 UTF-7 CSUNICODE11UTF7
UCS-2-INTERNAL
UCS-2-SWAPPED
UCS-4-INTERNAL
UCS-4-SWAPPED

Gee, didn't Unicode just so simplify this codepage mess? Remember when it was just ASCII, EBCDIC, ANSI, and localized codepages?

--
Brandon Aiken
CS/IT Systems Engineer
-----Original Message-----
From: pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-owner(at)postgresql(dot)org] On Behalf Of Arnaud Lesauvage
Sent: Wednesday, November 22, 2006 12:38 PM
To: Arnaud Lesauvage; General
Subject: Re: [GENERAL] MSSQL to PostgreSQL : Encoding problem

Alvaro Herrera a écrit :
> Arnaud Lesauvage wrote:
>> Alvaro Herrera a écrit :
>> >Arnaud Lesauvage wrote:
>> >
>> >>mydb=# SET client_encoding TO LATIN9;
>> >>SET
>> >>mydb=# COPY statistiques.detailrecherche (log_gid,
>> >>champrecherche, valeurrecherche) FROM
>> >>'E:\\Production\\Temp\\detailrecherche_ansi.csv' CSV;
>> >>ERROR: invalid byte sequence for encoding "LATIN9": 0x00
>> >>HINT: This error can also happen if the byte sequence does
>> >>not match the encoding expected by the server, which is
>> >>controlled by "client_encoding".
>> >
>> >Huh, why do you have a "0x00" byte in there? That's certainly not
>> >Latin9 (nor UTF8 as far as I know).
>> >
>> >Is the file actually Latin-something or did you convert it to something
>> >else at some point?
>>
>> This is the file generated by DTS with "ANSI" encoding. It
>> was not altered in any way after that !
>> The doc states that ANSI exports with the local codepage
>> (which is Win1252). That's all I know. :(
>
> I thought Win1252 was supposed to be almost the same as Latin1. While
> I'd expect certain differences, I wouldn't expect it to use 0x00 as
> data!
>
> Maybe you could have DTS export Unicode, which would presumably be
> UTF-16, then recode that to something else (possibly UTF-8) with GNU
> iconv.

UTF-16 ! That's something I haven't tried !
I'll try an iconv conversion tomorrow from UTF16 to UTF8 !

--
Arnaud

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Stephen Harris 2006-11-22 18:56:23 Re: Shutting down a warm standby database in 8.2beta3
Previous Message Tom Lane 2006-11-22 18:52:51 Re: Buffer overflow in psql