pg_dump, pg_restore and UTF8: invalid byte sequence

From: <me(at)alternize(dot)com>
To: <pgsql-novice(at)postgresql(dot)org>
Subject: pg_dump, pg_restore and UTF8: invalid byte sequence
Date: 2006-10-17 01:20:31
Message-ID: 04e601c6f18a$70da8db0$6501a8c0@iwing
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

hi list

while trying to get pgsql 8.2b1 running, i'm already thumbling at the first steps...
source pgslq is 8.1.5 (win32, utf8) on localhost:5432, target is 8.2b1 (win32, utf8) on localhost:5433

to get the data from one instance to the other, i'm trying to use pg_dump & pg_restore:

C:\pgsql8.2beta1\bin\pg_dump.exe -Fc -Z 5 -U postgres -p 5432 -f d:\mydb.backup mydb
C:\pgsql8.2beta1\bin\pg_restore -d mydb -p 5433 -U postgres d:\mydb.backup

first of all, pg_restore does not seem to like if the destination database does not yet exist.
if you omit "-d mydb", pg_restore just dumps the mydb.backup content to the command line instead of restoring the data.
the parameter "-C" doesn't do its job either. the only way around this seems to manually create the target table before running pg_restore, despite CREATE DATABASE... is present in the mydb.backup. shouldn't pg_restore be able to create the db for me, with the exact same settings as in the dump?

when creating the db manually beforehand, i'm getting over 130 errors (mostly due to TSearch2 & co functions already present). the manual suggests to create the db from template0, but doing so will break restoration of tsearch2 (missing function errors appear) objects.

the other, more severe problem is not all UTF8-data being properly encoded/decoded. during the restore i'm getting this error:

pg_restore: [archiver (db)] Error from TOC entry 3430; 0 4131599 TABLE DATA posts postgres
pg_restore: [archiver (db)] COPY failed: ERROR: invalid byte sequence for encoding "UTF8": 0xe3b66e
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
CONTEXT: COPY posts, line 1

as a result, the entire table is empty! shouldn't pg_dump encode the utf8 bytesequences?

also, regarding pg_restore, its quite troubling it has the same parameter-set as pg_dump. even if its probably the same tool in the background, shouldn't pg_restore hide the parameters that are just for dumping data and only showing the ones relevant for data restoring?

thanks,
thomas

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message me 2006-10-17 03:23:02 Re: pg_dump, pg_restore and UTF8: invalid byte sequence
Previous Message Yadnyesh Joshi 2006-10-16 17:02:12 Re: Using host variables -- segmentation fault