Quick Links

Re: Bug in UTF8-Validation Code?

From:	Mark Dilger <pgsql(at)markdilger(dot)com>
To:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc:	Albe Laurenz <all(at)adv(dot)magwien(dot)gv(dot)at>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Bug in UTF8-Validation Code?
Date:	2007-04-03 15:47:14
Message-ID:	46127702.8060100@markdilger.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Martijn van Oosterhout wrote:
> On Tue, Apr 03, 2007 at 11:43:21AM +0200, Albe Laurenz wrote:
>> IMHO this is the only good and intuitive way for CHR() and ASCII().
>
> Hardly. The comment earlier about mbtowc was much closer to the mark.
> And wide characters are defined as Unicode points.
>
> Basically, CHR() takes a unicode point and returns that character
> in a string appropriately encoded. ASCII() does the reverse.
>
> Just about every multibyte encoding other than Unicode has the problem
> of not distinguishing between the code point and the encoding of it.
> Unicode is a collection of encodings based on the same set.
>
> Have a nice day,

Thanks for the feedback. Would you say that the way I implemented things in the
example code would be correct for multibyte non Unicode encodings? I don't see
how to avoid the endianness issue for those encodings.

mark

In response to

Re: Bug in UTF8-Validation Code? at 2007-04-03 14:36:18 from Martijn van Oosterhout

Responses

Re: Bug in UTF8-Validation Code? at 2007-04-03 17:06:38 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Albe Laurenz	2007-04-03 15:47:27	Re: Bug in UTF8-Validation Code?
Previous Message	NikhilS	2007-04-03 15:36:10	Re: Auto Partitioning Patch - WIP version 1