Java's Unicode Notation

From: Jean-Michel POURE <jm(dot)poure(at)freesurf(dot)fr>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Java's Unicode Notation
Date: 2001-11-07 20:45:42
Message-ID: 4.2.0.58.20011107214231.00a89aa0@pop.freesurf.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear all,

Could it be possible to use the Java Unicode Notation to define UTF-8
strings in PostgreSQL 7.2.
Information can be found on http://czyborra.com/utf/

Best regards,
Jean-Michel pOURE

************************************************

Java's Unicode Notation
There are some less compact but more readable ASCII transformations the
most important of which is the Java Unicode Notation as allowed in Java
source code and processed by Java's native2ascii converter:

putwchar(c)
{
if (c >= 0x10000) {
printf ("\\u%04x\\u%04x" , 0xD7C0 + (c >> 10), 0xDC00 | c & 0x3FF);
}
else if (c >= 0x100) printf ("\\u%04x", c);
else putchar (c);
}

The advantage of the \u20ac notation is that it is very easy to type it in
on any old ASCII keyboard and easy to look up the intended character if you
happen to have a copy of the Unicode book or the
{unidata2,names2,unihan}.txt files from the Unicode FTP site or CD-ROM or
know what U+20AC is the €.

What's not so nice about the \u20ac notation is that the small letters are
quite unusual for Unicode characters, the backslashes have to be quoted for
many Unix tools, the four hexdigits without a terminator may appear merged
with the following word as in \u00a333 for £33, it is unclear when and how
you have to escape the backslash character itself, 6 bytes for one
character may be considered wasteful, and there is no way to clearly
present the characters beyond \uffff without \ud800\udc00 surrogates, and
last but not least the plain hexnumbers may not be very helpful.

JAVA is one of the target and source encodings of yudit and its uniconv
converter.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Frank Ch. Eigler 2001-11-07 20:45:43 ACL-related adt functions: aclcontains vs aclcheck
Previous Message Jeremy Wohl 2001-11-07 19:43:59 Re: MD5-based passwords