| From: | "John Hansen" <john(at)geeknet(dot)com(dot)au> |
|---|---|
| To: | "Dennis Bjorklund" <db(at)zigo(dot)dhs(dot)org> |
| Cc: | "Takehiko Abe" <keke(at)mac(dot)com>, <pgsql-hackers(at)postgresql(dot)org>, <pgsql-patches(at)postgresql(dot)org> |
| Subject: | Re: [PATCHES] UNICODE characters above 0x10000 |
| Date: | 2004-08-07 13:40:36 |
| Message-ID: | 5066E5A966339E42AA04BA10BA706AE56173@rodrick.geeknet.com.au |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers pgsql-patches |
> -----Original Message-----
> From: Dennis Bjorklund [mailto:db(at)zigo(dot)dhs(dot)org]
> Sent: Saturday, August 07, 2004 11:23 PM
> To: John Hansen
> Cc: Takehiko Abe; pgsql-hackers(at)postgresql(dot)org
> Subject: RE: [PATCHES] [HACKERS] UNICODE characters above 0x10000
>
> On Sat, 7 Aug 2004, John Hansen wrote:
>
> > Now, is it really 24 bits tho?
> > Afaict, it's really 21 (0 - 10FFFF or 0 - xxx10000 11111111
> 11111111)
>
> Yes, up to 0x10ffff should be enough.
>
> The 24 is not really important, this is all about what utf-8
> strings to accept as input. The strings are stored as utf-8
> strings and when processed inside pg it uses wchar_t that is
> 32 bit (on some systems at least). By restricting the utf-8
> input to unicode we can in the future store each character as
> 3 bytes if we want.
Which brings us back to something like the attached...
>
> --
> /Dennis Björklund
>
>
>
Regards,
John Hansen
| Attachment | Content-Type | Size |
|---|---|---|
| wchar.c.patch | application/octet-stream | 2.5 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jan Wieck | 2004-08-07 14:24:34 | Re: Updateable Views? |
| Previous Message | Jan Wieck | 2004-08-07 13:34:57 | Re: Vacuum Cost Documentation? |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2004-08-07 16:43:20 | Re: [PATCHES] UNICODE characters above 0x10000 |
| Previous Message | Dennis Bjorklund | 2004-08-07 11:11:28 | Re: [PATCHES] UNICODE characters above 0x10000 |