Skip site navigation (1) Skip section navigation (2)

Re: Bug in UTF8-Validation Code?

From: Mark Dilger <pgsql(at)markdilger(dot)com>
To: Mark Dilger <pgsql(at)markdilger(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Bug in UTF8-Validation Code?
Date: 2007-04-02 22:18:55
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
Mark Dilger wrote:
> Tom Lane wrote:
>> Mark Dilger <pgsql(at)markdilger(dot)com> writes:
>>>> pgsql=# select chr(14989485);
>>>> chr
>>>> -----
>>>> 中
>>>> (1 row)
>> Is there a principled rationale for this particular behavior as
>> opposed to any other?
>> In particular, in UTF8 land I'd have expected the argument of chr()
>> to be interpreted as a Unicode code point, not as actual UTF8 bytes
>> with a randomly-chosen endianness.
>> Not sure what to do in other multibyte encodings.
> "Not sure what to do in other multibyte encodings" was pretty much my 
> rationale for this particular behavior.  I standardized on network byte 
> order because there are only two endianesses to choose from, and the 
> other seems to be a more surprising choice.
> I looked around on the web for a standard for how to convert an integer 
> into a valid multibyte character and didn't find anything.  Andrew, 
> Supernews has said upthread that chr() is clearly wrong and needs to be 
> fixed. If so, we need some clear definition what "fixed" means.
> Any suggestions?
> mark

Since chr() is defined in oracle_compat.c, I decided to look at what Oracle 
might do.  See

It looks to me like they are doing the same thing that I did, though I don't 
have Oracle installed anywhere to verify that.  Is there a difference?


In response to


pgsql-hackers by date

Next:From: Bruce MomjianDate: 2007-04-02 22:25:48
Subject: Re: Grouped Index Tuples / Clustered Indexes
Previous:From: Josh BerkusDate: 2007-04-02 22:06:31
Subject: Mentor for ASync I/O for SoC

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group