Quick Links

Re: Bug in UTF8-Validation Code?

From:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
To:	tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc:	alvherre(at)commandprompt(dot)com, kleptog(at)svana(dot)org, pgsql(at)markdilger(dot)com, all(at)adv(dot)magwien(dot)gv(dot)at, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Bug in UTF8-Validation Code?
Date:	2007-04-04 15:33:40
Message-ID:	20070405.003340.123426498.t-ishii@sraoss.co.jp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > Right -- IMHO what we should be doing is reject any input to chr() which
> > is beyond plain ASCII (or maybe > 255), and create a separate function
> > (unicode_char() sounds good) to get an Unicode character from a code
> > point, converted to the local client_encoding per conversion_procs.
>
> Hm, I hadn't thought of that approach, but another idea is that the
> argument of chr() is *always* a unicode code point, and it converts
> to the current encoding. Do we really need a separate function?

To be honest, I don't really see why we need to rush to add such
Unicode(I assume we are reffering to "Unicode" as ISO 10646)
specialized functions at this point. Limiting chr() to ASCII range is
enough, I think.

BTW, every encoding has its own charset. However the relationship
between encoding and charset are not so simple as Unicode. For
example, encoding EUC_JP correponds to multiple charsets, namely
ASCII, JIS X 0201, JIS X 0208 and JIS X 0212. So a function which
returns a "code point" is not quite usefull since it lacks the charset
info. I think we need to continute design discussion, probably
targetting for 8.4, not 8.3.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

In response to

Re: Bug in UTF8-Validation Code? at 2007-04-04 14:22:28 from Tom Lane

Responses

Re: Bug in UTF8-Validation Code? at 2007-04-04 15:50:32 from Alvaro Herrera
Re: Bug in UTF8-Validation Code? at 2007-04-04 15:56:50 from Mark Dilger
Re: Bug in UTF8-Validation Code? at 2007-04-05 09:52:14 from Albe Laurenz

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2007-04-04 15:39:56	Re: IDENTITY/GENERATED v36 Re: Final version of IDENTITY/GENERATED patch
Previous Message	Martijn van Oosterhout	2007-04-04 15:21:41	Re: Bug in UTF8-Validation Code?