Quick Links

Re: Bug in UTF8-Validation Code?

From:	Mark Dilger <pgsql(at)markdilger(dot)com>
To:	Mark Dilger <pgsql(at)markdilger(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Bug in UTF8-Validation Code?
Date:	2007-04-02 22:18:55
Message-ID:	4611814F.1070308@markdilger.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Mark Dilger wrote:
> Tom Lane wrote:
>> Mark Dilger <pgsql(at)markdilger(dot)com> writes:
>>>> pgsql=# select chr(14989485);
>>>> chr
>>>> -----
>>>> ä¸
>>>> (1 row)
>>
>> Is there a principled rationale for this particular behavior as
>> opposed to any other?
>>
>> In particular, in UTF8 land I'd have expected the argument of chr()
>> to be interpreted as a Unicode code point, not as actual UTF8 bytes
>> with a randomly-chosen endianness.
>>
>> Not sure what to do in other multibyte encodings.
>
> "Not sure what to do in other multibyte encodings" was pretty much my
> rationale for this particular behavior. I standardized on network byte
> order because there are only two endianesses to choose from, and the
> other seems to be a more surprising choice.
>
> I looked around on the web for a standard for how to convert an integer
> into a valid multibyte character and didn't find anything. Andrew,
> Supernews has said upthread that chr() is clearly wrong and needs to be
> fixed. If so, we need some clear definition what "fixed" means.
>
> Any suggestions?
>
> mark

Since chr() is defined in oracle_compat.c, I decided to look at what Oracle
might do. See
http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96540/functions18a.htm

It looks to me like they are doing the same thing that I did, though I don't
have Oracle installed anywhere to verify that. Is there a difference?

mark

In response to

Re: Bug in UTF8-Validation Code? at 2007-04-02 22:02:21 from Mark Dilger

Responses

Re: Bug in UTF8-Validation Code? at 2007-04-02 23:11:57 from Mark Dilger
Re: Bug in UTF8-Validation Code? at 2007-04-03 09:43:21 from Albe Laurenz

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bruce Momjian	2007-04-02 22:25:48	Re: Grouped Index Tuples / Clustered Indexes
Previous Message	Josh Berkus	2007-04-02 22:06:31	Mentor for ASync I/O for SoC