Re: unicode error and problem

From: Markus Bertheau <twanger(at)bluetwanger(dot)de>
To: Paolo Supino <paolo(at)telmap(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: unicode error and problem
Date: 2004-03-24 20:49:09
Message-ID: 1080161348.1988.6.camel@yarrow.bertheau.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

В Срд, 24.03.2004, в 11:33, Paolo Supino пишет:
> Hi
>
> I received a unicode CSV file from someone (the file was created on a
> windows system) and I'm trying to import it into postgresql. When it gets to
> a line that isn't ascii it prints the following error and aborts: "ERROR:
> copy: line 33, Invalid UNICODE character sequence found (0xd956)".

Try to convert the file from UTF-16 (which might be the encoding of the
file) to UTF-8 with iconv:

iconv --from UTF-16 --to UTF-8 file > file.UTF-8

Maybe the file is not in UTF-16 but in some other encoding - convert
accordingly then.

By the way, Unicode is just a number -> glyph mapping, it doesn't say
anything about the representation of that number in the byte stream.
UTF-8 and UTF-16 are such representation specifications.

The encoding name in PostgreSQL should be changed from UNICODE to UTF-8
because UNICODE really just isn't an encoding.

--
Markus Bertheau <twanger(at)bluetwanger(dot)de>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Anony Mous 2004-03-24 20:55:07 Re: pg_dump "what if?"
Previous Message Wes Palmer 2004-03-24 20:44:26 too many arguments to function `getpwuid_r'

Browse pgsql-hackers by date

  From Date Subject
Next Message David Garamond 2004-03-24 21:22:52 Re: subversion vs cvs (Was: Re: linked list rewrite)
Previous Message Andrew Hammond 2004-03-24 20:36:20 rotatelogs integration in pg_ctl