From: | Patrice Hédé <phede-ml(at)islande(dot)org> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Cc: | jm(dot)poure(at)freesurf(dot)fr |
Subject: | Re: [HACKERS] UTF-8 safe ascii() function |
Date: | 2002-05-19 09:44:13 |
Message-ID: | 20020519114413.2265b70e.phede-ml@islande.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin pgsql-general pgsql-hackers pgsql-interfaces pgsql-odbc |
Hi Jean-Michel,
Jean-Michel POURE <jm(dot)poure(at)freesurf(dot)fr> a écrit :
> Dear all,
>
> I would like to transform UTF-8 strings into Java-Unicode. Example :
> - Latin1 : 'é'
> - UTF-8 : 'é'
> - Java Unicode = '\u00233'
>
> Basically, a Unicode compatible ascii() function would be fine.
> ascii('é') should return 233.
>
> 1) Has anyone written an ascii UTF-8 safe wrapper to ascii() function?
> If yes, would you be so kind to publish this function on the list.
OK, I just gave it a try, see the attachment.
The function is taking the first character of a TEXT element, and
returns its UCS2 value. I just did some basic test (i.e. I have not
tried with 3 or 4 bytes UTF-8 chars). The function is following the
Unicode 3.2 spec.
SELECT utf8toucs2('a'), utf8toucs2('é');
utf8toucs2 | utf8toucs2
------------+------------
97 | 233
(1 row)
The function returns -1 on error.
> 2) Are there plans to add an ascii() UTF-8 safe function to
> PostrgeSQL?
I don't think the function I did is useful as such. It would be better
to make a function that converts the whole string or something.
By the way, what is the encoding for Java Unicode ? is it always "\u"
followed by 5 hex digits (in which case your example is wrong) ? Then,
it shouldn't be too difficult to make the relevant function, though I'm
wondering if the Java programme would convert an incoming '\' 'u' '0'
'0' '2' '3' '3' to the corresponding UCS2/UTF16 character ?
Maybe we should have some similar input (and output ?) functionality in
psql, but then I would much prefer the Perl way, which is
\x{hex_digits}, which is unambiguous.
Regards,
Patrice
--
Patrice Hédé
email: patrice hede(à)islande org
www : http://www.islande.org/
Attachment | Content-Type | Size |
---|---|---|
utf8toucs2.c | text/x-csrc | 3.2 KB |
utf8toucs2.sql | text/x-sql | 140 bytes |
Makefile | text/x-makefile | 397 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Jean-Michel POURE | 2002-05-19 10:44:56 | Re: [HACKERS] UTF-8 safe ascii() function |
Previous Message | Jean-Michel POURE | 2002-05-18 16:53:19 | UTF-8 safe ascii() function |
From | Date | Subject | |
---|---|---|---|
Next Message | Wm. G. Urquhart | 2002-05-19 10:03:12 | Re: More on "What am I doing wrong!" |
Previous Message | Wm. G. Urquhart | 2002-05-19 08:56:44 | Re: More on "What am I doing wrong!" |
From | Date | Subject | |
---|---|---|---|
Next Message | Jean-Michel POURE | 2002-05-19 10:44:56 | Re: [HACKERS] UTF-8 safe ascii() function |
Previous Message | Mark kirkwood | 2002-05-19 02:59:10 | Re: Unbounded (Possibly) Database Size Increase - Toasting |
From | Date | Subject | |
---|---|---|---|
Next Message | Jean-Michel POURE | 2002-05-19 10:44:56 | Re: [HACKERS] UTF-8 safe ascii() function |
Previous Message | C. Maj | 2002-05-19 00:37:11 | Re: [HACKERS] libpgtcl - backend version information |
From | Date | Subject | |
---|---|---|---|
Next Message | Jean-Michel POURE | 2002-05-19 10:44:56 | Re: [HACKERS] UTF-8 safe ascii() function |
Previous Message | Jean-Michel POURE | 2002-05-18 16:53:19 | UTF-8 safe ascii() function |