Re: Java's Unicode Notation

From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: jm(dot)poure(at)freesurf(dot)fr
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Java's Unicode Notation
Date: 2001-11-11 10:04:22
Message-ID: 20011111190422Y.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Jean-Michel POURE <jm(dot)poure(at)freesurf(dot)fr>
Subject: Java's Unicode Notation
Date: Thu, 08 Nov 2001 14:12:04 +0100
Message-ID: <4(dot)2(dot)0(dot)58(dot)20011108141018(dot)00a59dc0(at)pop(dot)freesurf(dot)fr>

> Dear Tatsuo,
>
> Could it be possible to use the Java Unicode Notation to define UTF-8
> strings in PostgreSQL 7.2.

No. It's too late. We are in the beta freeze stage.

> Information can be found on http://czyborra.com/utf/
>
> Do you think it is hard to implement?
>
> Best regards,
> Jean-Michel POURE
>
> ************************************************
> Java's Unicode Notation
> There are some less compact but more readable ASCII transformations the
> most important of which is the Java Unicode Notation as allowed in Java
> source code and processed by Java's native2ascii converter:
> putwchar(c)
> {
> if (c >= 0x10000) {
> printf ("\\u%04x\\u%04x" , 0xD7C0 + (c >> 10), 0xDC00 | c & 0x3FF);
> }
> else if (c >= 0x100) printf ("\\u%04x", c);
> else putchar (c);
> }
> The advantage of the \u20ac notation is that it is very easy to type it in
> on any old ASCII keyboard and easy to look up the intended character if you
> happen to have a copy of the Unicode book or the
> {unidata2,names2,unihan}.txt files from the Unicode FTP site or CD-ROM or
> know what U+20AC is the .
> What's not so nice about the \u20ac notation is that the small letters are
> quite unusual for Unicode characters, the backslashes have to be quoted for
> many Unix tools, the four hexdigits without a terminator may appear merged
> with the following word as in \u00a333 for 33, it is unclear when and how
> you have to escape the backslash character itself, 6 bytes for one
> character may be considered wasteful, and there is no way to clearly
> present the characters beyond \uffff without \ud800\udc00 surrogates, and
> last but not least the plain hexnumbers may not be very helpful.
> JAVA is one of the target and source encodings of yudit and its uniconv
> converter.
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2001-11-11 15:38:22 Re: Possible major bug in PlPython (plus some other ideas)
Previous Message Chris Ryan 2001-11-11 03:36:14 Re: OT?: PGReplication project dead?