Quick Links

Re: Bug in UTF8-Validation Code?

From:	Mark Dilger <pgsql(at)markdilger(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Bug in UTF8-Validation Code?
Date:	2007-04-02 22:02:21
Message-ID:	46117D6D.7050705@markdilger.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Tom Lane wrote:
> Mark Dilger <pgsql(at)markdilger(dot)com> writes:
>>> pgsql=# select chr(14989485);
>>> chr
>>> -----
>>> ä¸
>>> (1 row)
>
> Is there a principled rationale for this particular behavior as
> opposed to any other?
>
> In particular, in UTF8 land I'd have expected the argument of chr()
> to be interpreted as a Unicode code point, not as actual UTF8 bytes
> with a randomly-chosen endianness.
>
> Not sure what to do in other multibyte encodings.

"Not sure what to do in other multibyte encodings" was pretty much my rationale
for this particular behavior. I standardized on network byte order because
there are only two endianesses to choose from, and the other seems to be a more
surprising choice.

I looked around on the web for a standard for how to convert an integer into a
valid multibyte character and didn't find anything. Andrew, Supernews has said
upthread that chr() is clearly wrong and needs to be fixed. If so, we need some
clear definition what "fixed" means.

Any suggestions?

mark

In response to

Re: Bug in UTF8-Validation Code? at 2007-04-02 22:37:11 from Tom Lane

Responses

Re: Bug in UTF8-Validation Code? at 2007-04-02 22:05:27 from Mark Dilger
Re: Bug in UTF8-Validation Code? at 2007-04-02 22:18:55 from Mark Dilger

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Mark Dilger	2007-04-02 22:05:27	Re: Bug in UTF8-Validation Code?
Previous Message	Tom Lane	2007-04-02 21:42:14	Re: Questions about pid file creation code