Re: UNICODE characters above 0x10000

From: "John Hansen" <john(at)geeknet(dot)com(dot)au>
To: "Hackers" <pgsql-hackers(at)postgresql(dot)org>
Cc: "Patches" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: UNICODE characters above 0x10000
Date: 2004-08-07 06:29:20
Message-ID: 5066E5A966339E42AA04BA10BA706AE56088@rodrick.geeknet.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Possibly, since I got it wrong once more....
About to give up, but attached, Updated patch.

Regards,

John Hansen

-----Original Message-----
From: Oliver Elphick [mailto:olly(at)lfix(dot)co(dot)uk]
Sent: Saturday, August 07, 2004 3:56 PM
To: Tom Lane
Cc: John Hansen; Hackers; Patches
Subject: Re: [HACKERS] UNICODE characters above 0x10000

On Sat, 2004-08-07 at 06:06, Tom Lane wrote:
> Now it's entirely possible that the underlying support is a few bricks

> shy of a load --- for instance I see that pg_utf_mblen thinks there
> are no UTF8 codes longer than 3 bytes whereas your code goes to 4.
> I'm not an expert on this stuff, so I don't know what the UTF8 spec
> actually says. But I do think you are fixing the code at the wrong
level.

UTF-8 characters can be up to 6 bytes long:
http://www.cl.cam.ac.uk/~mgk25/unicode.html

glibc provides various routines (mb...) for handling Unicode. How many
of our supported platforms don't have these? If there are still some
that don't, wouldn't it be better to use the standard routines where
they do exist?

--
Oliver Elphick olly(at)lfix(dot)co(dot)uk
Isle of Wight http://www.lfix.co.uk/oliver
GPG: 1024D/A54310EA 92C8 39E7 280E 3631 3F0E 1EC0 5664 7A2F A543 10EA
========================================
"Be still before the LORD and wait patiently for him;
do not fret when men succeed in their ways, when they
carry out their wicked schemes."
Psalms 37:7

Attachment Content-Type Size
wchar.c.patch application/octet-stream 2.4 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2004-08-07 06:49:06 Re: UNICODE characters above 0x10000
Previous Message Dennis Bjorklund 2004-08-07 06:27:31 Re: UNICODE characters above 0x10000

Browse pgsql-patches by date

  From Date Subject
Next Message Gavin Sherry 2004-08-07 06:32:34 Re: Minor BEFORE DELETE trigger fix
Previous Message Dennis Bjorklund 2004-08-07 06:27:31 Re: UNICODE characters above 0x10000