Quick Links

Re: UNICODE characters above 0x10000

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>
Cc:	John Hansen <john(at)geeknet(dot)com(dot)au>, Hackers <pgsql-hackers(at)postgresql(dot)org>, Patches <pgsql-patches(at)postgresql(dot)org>
Subject:	Re: UNICODE characters above 0x10000
Date:	2004-08-07 06:49:06
Message-ID:	27050.1091861346@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-patches

Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org> writes:
> ... This also means that the start byte can never start with 7 or 8
> ones, that is illegal and should be tested for and rejected. So the
> longest utf-8 sequence is 6 bytes (and the longest character needs 4
> bytes (or 31 bits)).

Tatsuo would know more about this than me, but it looks from here like
our coding was originally designed to support only 16-bit-wide internal
characters (ie, 16-bit pg_wchar datatype width). I believe that the
regex library limitation here is gone, and that as far as that library
is concerned we could assume a 32-bit internal character width. The
question at hand is whether we can support 32-bit characters or not ---
and if not, what's the next bug to fix?

regards, tom lane

In response to

Re: UNICODE characters above 0x10000 at 2004-08-07 06:27:31 from Dennis Bjorklund

Responses

Re: UNICODE characters above 0x10000 at 2004-08-07 07:01:37 from Dennis Bjorklund
Re: [PATCHES] UNICODE characters above 0x10000 at 2004-08-07 10:09:13 from Tatsuo Ishii

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Dennis Bjorklund	2004-08-07 07:01:37	Re: UNICODE characters above 0x10000
Previous Message	John Hansen	2004-08-07 06:29:20	Re: UNICODE characters above 0x10000

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Dennis Bjorklund	2004-08-07 07:01:37	Re: UNICODE characters above 0x10000
Previous Message	Gavin Sherry	2004-08-07 06:32:34	Re: Minor BEFORE DELETE trigger fix