Quick Links

Differences in UTF8 between 8.0 and 8.1

From:	Paul Lindner <lindner(at)inuus(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Differences in UTF8 between 8.0 and 8.1
Date:	2005-10-22 15:48:27
Message-ID:	20051022154827.GC27646@inuus.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I've been doing some test imports of UNICODE databases into Postgres
8.1beta3. The only problem I've seen is that some data from 8.0
databases will not import.

I've generated dumps using pg_dump from 8.0 and 8.1. Attempting to
restore these results in

Invalid UNICODE byte sequence detected near byte ...

Question:

Does the 8.1 Unicode sanity code accept the full set of characters
accepted by the 8.0 Unicode sanity code?

If not we'll see a lot of problems like the one above.

I believe this patch is the one causing the problem I see:

http://www.mail-archive.com/pgsql-patches(at)postgresql(dot)org/msg08198/unicode.diff

Is there any solution other than scrubbing the entire dataset to
conform to the new (8.1) encoding rules?

--
Paul Lindner ||||| | | | | | | | | |
lindner(at)inuus(dot)com

Responses

Re: Differences in UTF8 between 8.0 and 8.1 at 2005-10-23 05:56:50 from Andrew - Supernews

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Martijn van Oosterhout	2005-10-22 16:17:12	Re: Lifecycle management
Previous Message	Tom Lane	2005-10-22 15:46:50	Re: [PATCHES] Win32 CHECK_FOR_INTERRUPTS() performance