Skip site navigation (1) Skip section navigation (2)

Re: Differences in UTF8 between 8.0 and 8.1

From: Paul Lindner <lindner(at)inuus(dot)com>
To: andrew(at)supernews(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Differences in UTF8 between 8.0 and 8.1
Date: 2005-10-24 03:48:00
Message-ID: 20051024034800.GD27646@inuus.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On Sun, Oct 23, 2005 at 05:56:50AM -0000, Andrew - Supernews wrote:
> On 2005-10-22, Paul Lindner <lindner(at)inuus(dot)com> wrote:
> > I've generated dumps using pg_dump from 8.0 and 8.1.  Attempting to
> > restore these results in
> >
> >  Invalid UNICODE byte sequence detected near byte ...
> 
> What were the exact offending bytes?

Here's a cut and paste from emacs hexl-mode:

00000000: 3530 3833 6335 3038 330a 3c20 5641 4c55  5083c5083.< VALU
00000010: 4553 2028 3230 3235 3533 2c20 27c1 f9d4  ES (202553, '...
00000020: c2d0 c7d2 b927 2c20 0a2d 2d2d 0a3e 2056  .....', .---.> V
00000030: 414c 5545 5320 2832 3032 3535 332c 2027  ALUES (202553, '
00000040: d2b9 272c 200a 3136 3939 3432 6331 3639  ..', .169942c169
00000050: 3934 320a 3c20 5641 4c55 4553 2028 3833  942.< VALUES (83
00000060: 3031 352c 2027 b7ed a8c6 a448 272c 200a  015, '.....H', .
00000070: 2d2d 2d0a 3e20 5641 4c55 4553 2028 3833  ---.> VALUES (83
00000080: 3031 352c 2027 c6a4 4827 2c20 0a         015, '..H', .

This is of a minimal diff between a UTF8 scrubbed file and the
original dump.

It appears the offending bytes are:

  C1 F9 C2 D0 C7

and

  B7 ED A8


> > Question:
> >
> > Does the 8.1 Unicode sanity code accept the full set of characters
> > accepted by the 8.0 Unicode sanity code?
> 
> No. 8.0 and before accepted a lot of stuff that it should never have, and
> failed to accept stuff that it should have.

Thanks go out to John Hansen, he recommended to run the dump through iconv:

  iconv -c -f UTF8 -t UTF8 -o fixed.sql dump.sql

This seems to strip out invalid UTF8 and will allow for a clean
import.  Someone should add this to the Release Notes/FAQ..


-- 
Paul Lindner        ||||| | | | |  |  |  |   |   |
lindner(at)inuus(dot)com

In response to

Responses

pgsql-hackers by date

Next:From: Sergey E. KoposovDate: 2005-10-24 04:22:28
Subject: broken link in 8.1 docs
Previous:From: Sergey E. KoposovDate: 2005-10-24 03:42:38
Subject: Re: Call for port reports

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group