Mark Dilger wrote:
>>> In particular, in UTF8 land I'd have expected the argument of chr()
>>> to be interpreted as a Unicode code point, not as actual UTF8 bytes
>>> with a randomly-chosen endianness.
>>> Not sure what to do in other multibyte encodings.
>> "Not sure what to do in other multibyte encodings" was pretty much my
>> rationale for this particular behavior. I standardized on network
>> order because there are only two endianesses to choose from, and the
>> other seems to be a more surprising choice.
> Since chr() is defined in oracle_compat.c, I decided to look
> at what Oracle might do. See
> It looks to me like they are doing the same thing that I did,
> though I don't have Oracle installed anywhere to verify that.
> Is there a difference?
This is Oracle 10.2.0.3.0 ("latest and greatest") with UTF-8 encoding
(actually, Oracle chooses to call this encoding AL32UTF8):
SQL> SELECT ASCII('EUR') AS DEC,
2 TO_CHAR(ASCII('EUR'), 'XXXXXX') AS HEX
3 FROM DUAL;
SQL> SELECT CHR(14844588) AS EURO FROM DUAL;
I don't see how endianness enters into this at all - isn't that just
the question of how a byte is stored physically?
According to RFC 2279, the Euro,
Unicode code point 0x20AC = 0010 0000 1010 1100,
will be encoded to 1110 0010 1000 0010 1010 1100 = 0xE282AC.
IMHO this is the only good and intuitive way for CHR() and ASCII().
In response to
pgsql-hackers by date
|Next:||From: Koichi Suzuki||Date: 2007-04-03 10:14:54|
|Subject: Re: [HACKERS] Full page writes improvement, code update
|Previous:||From: Hiroshi Saito||Date: 2007-04-03 09:16:26|
|Subject: Re: PthreadGC2 of MinGW is not linked.|