Skip site navigation (1) Skip section navigation (2)

Re: invalid byte sequence for encoding "UNICODE": 0xd9

From: Eric Walstad <eric(at)ericwalstad(dot)com>
To: sfpug(at)postgresql(dot)org
Subject: Re: invalid byte sequence for encoding "UNICODE": 0xd9
Date: 2006-02-14 21:31:42
Message-ID: 200602141331.43643.eric@ericwalstad.com (view raw or flat)
Thread:
Lists: sfpug
On Monday 13 February 2006 21:29, David Fetter wrote:
> On Mon, Feb 13, 2006 at 02:37:45PM -0800, Eric Walstad wrote:
> > Hi everyone,
> >
> > Question: How do I keep from receiving the subject error message when
> > loading data?
>
> I suspect you'll have to pass iconv over the dump file, as mentioned
> in the release notes.  You may have had the database encoded in that
> abomination hiding under the mask of SQL_ASCII, which isn't really an
> encoding.  It's more like "any byte string without a null byte in it" :P
>
> HTH :)
>
> Cheers,
> D


Thanks for pointing me in the right direction, David.

I found the relevant section of the release notes here:
<http://www.postgresql.org/docs/current/interactive/release-8-1.html#AEN72739>


I first split my big dump file into managable chunks:

mkdir tmp
cd tmp
split -C 25000000 ../output.sql


Then I ran iconv on all the split files, using the command line suggested in 
the release notes:

for SPLIT_FILE in xa*
    do
        iconv -f UTF-8 -t UTF-8 $SPLIT_FILE >> converted.sql
    done


That, unfortunately, removed some other important bits of data (tabs, I think, 
next to the invalid unicode characters).  However, iconv did output messages 
when it encountered the invalid characters (with byte offsets, I think) which 
told me where the problems were located and in which split files.  I was then 
able to go into each split file and delete the characters by hand with vim, 
cat all the split files back together and load all the data successfully.


My postgresql.conf file has the encoding line commented out:

#client_encoding = sql_ascii    # actually, defaults to database encoding

No database encoding was specified when I created the database with createdb.  
I suspect that means 'sql_ascii' was used, but I didn't find where the 
default database encoding is specified so I don't know for sure.


Thanks again,


Eric.

In response to

sfpug by date

Next:From: Eric WalstadDate: 2006-02-15 03:59:43
Subject: SQL assistance, please...
Previous:From: David FetterDate: 2006-02-14 05:29:39
Subject: Re: invalid byte sequence for encoding "UNICODE": 0xd9

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group