Re: From ASCII to UTF-8

From: gabor <gabor(at)nekomancer(dot)net>
To: Clodoaldo Pinto <clodoaldo(dot)pinto(at)gmail(dot)com>
Cc: "pgsql-general postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: From ASCII to UTF-8
Date: 2006-02-26 16:36:08
Message-ID: 4401D8F8.209@nekomancer.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Clodoaldo Pinto wrote:
> As part of a migration from 8.0 to 8.1 i want to convert the data from
> ASCII to UTF-8.
>
> I dumped the database with pg_dump (8.0) and tried to convert it with
> iconv, but it shows an error:
>
> $ iconv -t ASCII -t UTF-8 fahstats_data.dump -o fahstats_data_utf-8.dump
> iconv: illegal input sequence at position 71407864
>
> That position contains the decimal value 233:
>
> $ od -A d -j 71407864 -N 1 -t u1 fahstats_data.dump
> 71407864 233
> 71407865
>
> I could use pg_dump -E in 8.1 but it is in another machine with ADSL
> connection and the dump size is 1.8GB. It would take more than 12
> hours.
>
> How to install pg_dump 8.1 only? I tried to copy the executable and
> the libs but it did not work.
>

from what you wrote it seems that your dump contains non-ascii characters...

probably somehow non-ascii data got into your database. like iso-8859-1
or iso-8859-15 or cp-1252 (if you are using western-european stuff).
in those encodings, 255 = é.

maybe you could try something like:
iconv -f ISO-8859-1 -t UTF-8 ....

please note that a conversion FROM these encodings always succeeds. so a
success does not mean that you guessed the charset correctly. you still
will havet to check manually if the resulting document contains the
correct data.

gabor

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2006-02-26 17:08:31 Re: Wish: remove ancient constructs from Postgres
Previous Message Jim C. Nasby 2006-02-26 16:14:58 Re: Postgre capability