Re: [HACKERS] UTF-8 safe ascii() function

From: Patrice Hédé <phede-ml(at)islande(dot)org>
To: pgsql-general(at)postgresql(dot)org
Cc: jm(dot)poure(at)freesurf(dot)fr
Subject: Re: [HACKERS] UTF-8 safe ascii() function
Date: 2002-05-19 09:44:13
Message-ID: 20020519114413.2265b70e.phede-ml@islande.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-general pgsql-hackers pgsql-interfaces pgsql-odbc

Hi Jean-Michel,

Jean-Michel POURE <jm(dot)poure(at)freesurf(dot)fr> a écrit :
> Dear all,
>
> I would like to transform UTF-8 strings into Java-Unicode. Example :
> - Latin1 : 'é'
> - UTF-8 : 'é'
> - Java Unicode = '\u00233'
>
> Basically, a Unicode compatible ascii() function would be fine.
> ascii('é') should return 233.
>
> 1) Has anyone written an ascii UTF-8 safe wrapper to ascii() function?
> If yes, would you be so kind to publish this function on the list.

OK, I just gave it a try, see the attachment.

The function is taking the first character of a TEXT element, and
returns its UCS2 value. I just did some basic test (i.e. I have not
tried with 3 or 4 bytes UTF-8 chars). The function is following the
Unicode 3.2 spec.

SELECT utf8toucs2('a'), utf8toucs2('é');
utf8toucs2 | utf8toucs2
------------+------------
97 | 233
(1 row)

The function returns -1 on error.

> 2) Are there plans to add an ascii() UTF-8 safe function to
> PostrgeSQL?

I don't think the function I did is useful as such. It would be better
to make a function that converts the whole string or something.

By the way, what is the encoding for Java Unicode ? is it always "\u"
followed by 5 hex digits (in which case your example is wrong) ? Then,
it shouldn't be too difficult to make the relevant function, though I'm
wondering if the Java programme would convert an incoming '\' 'u' '0'
'0' '2' '3' '3' to the corresponding UCS2/UTF16 character ?

Maybe we should have some similar input (and output ?) functionality in
psql, but then I would much prefer the Perl way, which is
\x{hex_digits}, which is unambiguous.

Regards,

Patrice

--
Patrice Hédé
email: patrice hede(à)islande org
www : http://www.islande.org/

Attachment Content-Type Size
utf8toucs2.c text/x-csrc 3.2 KB
utf8toucs2.sql text/x-sql 140 bytes
Makefile text/x-makefile 397 bytes

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Jean-Michel POURE 2002-05-19 10:44:56 Re: [HACKERS] UTF-8 safe ascii() function
Previous Message Jean-Michel POURE 2002-05-18 16:53:19 UTF-8 safe ascii() function

Browse pgsql-general by date

  From Date Subject
Next Message Wm. G. Urquhart 2002-05-19 10:03:12 Re: More on "What am I doing wrong!"
Previous Message Wm. G. Urquhart 2002-05-19 08:56:44 Re: More on "What am I doing wrong!"

Browse pgsql-hackers by date

  From Date Subject
Next Message Jean-Michel POURE 2002-05-19 10:44:56 Re: [HACKERS] UTF-8 safe ascii() function
Previous Message Mark kirkwood 2002-05-19 02:59:10 Re: Unbounded (Possibly) Database Size Increase - Toasting

Browse pgsql-interfaces by date

  From Date Subject
Next Message Jean-Michel POURE 2002-05-19 10:44:56 Re: [HACKERS] UTF-8 safe ascii() function
Previous Message C. Maj 2002-05-19 00:37:11 Re: [HACKERS] libpgtcl - backend version information

Browse pgsql-odbc by date

  From Date Subject
Next Message Jean-Michel POURE 2002-05-19 10:44:56 Re: [HACKERS] UTF-8 safe ascii() function
Previous Message Jean-Michel POURE 2002-05-18 16:53:19 UTF-8 safe ascii() function