Re: Dumping in LATIN1 and restoring in UTF-8

From: Tino Wildenhain <tino(at)wildenhain(dot)de>
To: Marco Bizzarri <marco(dot)bizzarri(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Dumping in LATIN1 and restoring in UTF-8
Date: 2006-07-06 06:49:34
Message-ID: 44ACB27E.6030303@wildenhain.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Marco Bizzarri schrieb:
> Hi all.
>
> Here is my use case: I've an application which uses PostgreSQL as
> backend. Up to now, the database was encoded in SQL_ASCII or LATIN1.
> Now, we need to migrate to UTF-8.
>
> What we tried, was to:
>
> 1) dump the database using pg_dump, in tar format (we had blob);
> 2) modifying the result, using some conversion tool (like recode)
>
>
> 3) destroying the old database
> 4) recreating the database with UNICODE setting
> 5) restoring the database using pg_restore
>
> The result was not what I expected. The pg_restore was using the
> LATIN1 encoding to encode the strings, resulting in a LATIN1 encoded
> in UTF-8...
>
> The problem lied in the toc.dat file, which stated that the client
> encoding was LATIN1, instead of UTF-8.
>
> The solution in the end has been to manually modifying the toc.dat
> file, substituting the LATIN1 string with UTF-8 (plus a space, since
> the toc.dat is a binary file).
>
> Even though it worked for us, I wonder if there is any other way to
> accomplish the same result, at least to specify the encoding for the
> restore.

Yes, its actually quite esay: you dump as you feel apropriate,
then create the database with the encoding you want,
restore w/o creating database and you are done.
Restore sets the client encoding to what it actually was
in the dump data (in your case latin-1) and the database
would be utf-8 - postgres automatically recodes. No need
for iconv and friends.

Regards
Tino

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Marco Bizzarri 2006-07-06 07:51:09 Re: Dumping in LATIN1 and restoring in UTF-8
Previous Message Angshu Kar 2006-07-06 04:25:17 Re: basic log question