Re: invalid byte sequence for encoding "UTF8"

From: Derrick Rice <derrick(dot)rice(at)gmail(dot)com>
To: BRUSSER Michael <Michael(dot)BRUSSER(at)3ds(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: invalid byte sequence for encoding "UTF8"
Date: 2011-06-02 21:16:52
Message-ID: BANLkTin8OFMfAmd704OpU7wXnx81qwz4vA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

That specific character sequence is a result of Unicode implementations
prior to 6.0 mixing with later implementations. See here:

http://en.wikipedia.org/wiki/Specials_%28Unicode_block%29#Replacement_character

You could replace that sequence with the correct 0xFFFD sequence with `sed`
for example (if using a plaintext dump format).

On Thu, Jun 2, 2011 at 4:17 PM, BRUSSER Michael <Michael(dot)BRUSSER(at)3ds(dot)com>wrote:

> We upgrading some old database (7.3.10 to 8.4.4). This involves
> running pg_dump on the old db
>
> and loading the datafile to the new db. If this matters we do not use
> pg_restore, the dump file is just sourced with psql,
>
> and this is where I ran into problem:
>
>
>
> psql: .../postgresql_archive.src/... ERROR: invalid byte sequence for
> encoding "UTF8": 0xedbebf
>
> HINT: This error can also happen if the byte sequence does not match the
> encoding
>
> expected by the server, which is controlled by "client_encoding".
>
>
>
> The server and client encoding are both Unicode. I think we may have some
> copy/paste MS-Word markup
>
> and possibly other odd things on the old database. All this junk is found
> on the ‘text’ fields.
>
>
>
> I found a number of related postings, but did not see a good solution.
> Some folks suggested cleaning the datafile prior to loading,
>
> while someone else did essentially the same thing on the database before
> dumping it.
>
> I am looking for advice, hopefully the “best technique” if there is one,
> any suggestion is appreciated.
>
>
>
> Thanks,
>
> Michael.
>
>
>
> This email and any attachments are intended solely for the use of the
> individual or entity to whom it is addressed and may be confidential and/or
> privileged.
>
> If you are not one of the named recipients or have received this email in
> error,
>
> (i) you should not read, disclose, or copy it,
>
> (ii) please notify sender of your receipt by reply email and delete this
> email and all attachments,
>
> (iii) Dassault Systemes does not accept or assume any liability or
> responsibility for any use of or reliance on this email.
>
> For other languages, go to http://www.3ds.com/terms/email-disclaimer
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message David Johnston 2011-06-03 00:26:48 Hidden Risk w/ UPDATE Cascade and Trigger-Based Validation
Previous Message BRUSSER Michael 2011-06-02 20:17:59 invalid byte sequence for encoding "UTF8"