Re: Bug in UTF8-Validation Code?

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Albe Laurenz <all(at)adv(dot)magwien(dot)gv(dot)at>
Cc: Mark Dilger *EXTERN* <pgsql(at)markdilger(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Bug in UTF8-Validation Code?
Date: 2007-04-03 14:36:18
Message-ID: 20070403143618.GA5405@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 03, 2007 at 11:43:21AM +0200, Albe Laurenz wrote:
> IMHO this is the only good and intuitive way for CHR() and ASCII().

Hardly. The comment earlier about mbtowc was much closer to the mark.
And wide characters are defined as Unicode points.

Basically, CHR() takes a unicode point and returns that character
in a string appropriately encoded. ASCII() does the reverse.

Just about every multibyte encoding other than Unicode has the problem
of not distinguishing between the code point and the encoding of it.
Unicode is a collection of encodings based on the same set.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tim Goodaire 2007-04-03 14:47:24 "Garbled" postgres logs
Previous Message Luke Lonergan 2007-04-03 14:28:31 Re: Modifying TOAST thresholds