Re: [HACKERS] Unicode combining characters

From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: phede-ml(at)islande(dot)org
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] Unicode combining characters
Date: 2001-10-10 01:12:01
Message-ID: 20011010101201N.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

> > After applying your patches, do the 4-bytes UTF-8 convert to UCS-2 (2
> > bytes) or UCS-4 (4 bytes) in pg_utf2wchar_with_len()? If it were 4
> > bytes, we are in trouble. Current regex implementaion does not handle
> > 4 byte width charsets.
>
> *sigh* yes, it does encode to four bytes :(
>
> Three solutions then :
>
> 1) we support these supplementary characters, knowing that they won't
> work with regexes,
>
> 2) I back out the change, but then anyone using these characters will
> get something weird, since the decoding would be faulty (they would
> be handled as 3 bytes UTF-8 chars, and then the fourth byte would
> become a "faulty char"... not very good, as the 3-byte version is
> still not a valid UTF-8 code !),
>
> 3) we fix the regex engine within the next 24 hours, before the beta
> deadline is activated :/
>
> I must say that I doubt that anyone will use these characters in the
> next few months : these are mostly chinese extended characters, with
> old italic, deseret, and gothic scripts, and bysantine and western
> musical symbols, as well as the mathematical alphanumerical symbols.
>
> I would prefer solution 1), as I think it is better to allow these
> characters, even with a temporary restriction on the regex, than to
> fail completely on them. As for solution 3), we may still work at it
> in the next few months :) [I haven't even looked at the regex engine
> yet, so I don't know the implications of what I have just said !]
>
> What do you think ?

I think 2) is not very good, and we should reject these 4-bytes UTF-8
strings. After all, we are not ready for them.

BTW, other part of your patches looks good. Peter, what do you think?
--
Tatsuo Ishii

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2001-10-10 01:18:42 row value constructor bug?
Previous Message Peter Eisentraut 2001-10-09 22:36:24 Re: [HACKERS] What about CREATE OR REPLACE FUNCTION?

Browse pgsql-patches by date

  From Date Subject
Next Message Patrice Hédé 2001-10-10 17:28:19 Re: [HACKERS] Unicode combining characters
Previous Message John Gray 2001-10-09 23:18:15 Re: Efficient slicing/substring of TOAST values (for