Re: pg_dump, pg_restore and UTF8: invalid byte sequence

From: <me(at)alternize(dot)com>
To: <me(at)alternize(dot)com>, <pgsql-novice(at)postgresql(dot)org>
Subject: Re: pg_dump, pg_restore and UTF8: invalid byte sequence
Date: 2006-10-17 03:23:02
Message-ID: 054401c6f19b$8e5c8210$6501a8c0@iwing
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

> shouldn't pg_dump encode the utf8 bytesequences?

at least i found out why the invalid unicode sequences appear in the first
place: tsearch2 in 8.1 doesn't properly handle utf8 characters: the
character's 2-byte representation is converted to lowercase byte for byte.
for example: "ä" which is encoded as "ä" is written to the db by tsearch2
as "ã¤" which is an invalid utf8 byte sequence.

striping the ts2 index columb before dumping fixes the encoding problems. i
guess the 8.2 -> 8.1.5 backport should fix it as well, i'll try asap.

> also, regarding pg_restore, its quite troubling it has the same
> parameter-set as pg_dump

never mind this, it is too late in the evening 8-)

- thomas

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Yadnyesh Joshi 2006-10-17 03:48:29 Inserting arrays from C program
Previous Message me 2006-10-17 01:20:31 pg_dump, pg_restore and UTF8: invalid byte sequence