Quick Links

Re: Differences in UTF8 between 8.0 and 8.1

From:	Andrew - Supernews <andrew+nonews(at)supernews(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Differences in UTF8 between 8.0 and 8.1
Date:	2005-10-27 11:56:02
Message-ID:	slrndm1g2i.g61.andrew+nonews@trinity.supernews.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2005-10-27, Paul Lindner <lindner(at)inuus(dot)com> wrote:
> On Mon, Oct 24, 2005 at 05:07:40AM -0000, Andrew - Supernews wrote:
>> I'm inclined to suspect that the whole sequence c1 f9 d4 c2 d0 c7 d2 b9
>> was never actually a valid utf-8 string, and that the d2 b9 is only valid
>> by coincidence (it's a Cyrillic letter from Azerbaijani). I know the 8.0
>> utf-8 check was broken, but I didn't realize it was quite so bad.
>
> Looking at the data it appears that it is a sequence of latin1
> characters. They all have the eighth bit set and all seem to pass the
> check.

In latin1 it comes out as total gibberish, so I think you'll find it is
actually in something else. Some googling suggests it is most likely in a
Chinese double-byte charset (GB2312).

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

In response to

Re: Differences in UTF8 between 8.0 and 8.1 at 2005-10-27 00:59:51 from Paul Lindner

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bruce Momjian	2005-10-27 12:54:57	Re: ERROR: invalid memory alloc request size <a_big_number_here>
Previous Message	Martijn van Oosterhout	2005-10-27 11:17:26	Ideas for easier debugging of backend problems