On Monday 13 February 2006 21:29, David Fetter wrote:
> On Mon, Feb 13, 2006 at 02:37:45PM -0800, Eric Walstad wrote:
> > Hi everyone,
> > Question: How do I keep from receiving the subject error message when
> > loading data?
> I suspect you'll have to pass iconv over the dump file, as mentioned
> in the release notes. You may have had the database encoded in that
> abomination hiding under the mask of SQL_ASCII, which isn't really an
> encoding. It's more like "any byte string without a null byte in it" :P
> HTH :)
Thanks for pointing me in the right direction, David.
I found the relevant section of the release notes here:
I first split my big dump file into managable chunks:
split -C 25000000 ../output.sql
Then I ran iconv on all the split files, using the command line suggested in
the release notes:
for SPLIT_FILE in xa*
iconv -f UTF-8 -t UTF-8 $SPLIT_FILE >> converted.sql
That, unfortunately, removed some other important bits of data (tabs, I think,
next to the invalid unicode characters). However, iconv did output messages
when it encountered the invalid characters (with byte offsets, I think) which
told me where the problems were located and in which split files. I was then
able to go into each split file and delete the characters by hand with vim,
cat all the split files back together and load all the data successfully.
My postgresql.conf file has the encoding line commented out:
#client_encoding = sql_ascii # actually, defaults to database encoding
No database encoding was specified when I created the database with createdb.
I suspect that means 'sql_ascii' was used, but I didn't find where the
default database encoding is specified so I don't know for sure.
In response to
sfpug by date
|Next:||From: Eric Walstad||Date: 2006-02-15 03:59:43|
|Subject: SQL assistance, please...|
|Previous:||From: David Fetter||Date: 2006-02-14 05:29:39|
|Subject: Re: invalid byte sequence for encoding "UNICODE": 0xd9|