On Monday 13 April 2009 22:39:58 Andrew Dunstan wrote:
> Umm, but isn't that because your encoding is using one code point?
>
> See the OP's explanation w.r.t. canonical equivalence.
>
> This isn't about the number of bytes, but about whether or not we should
> count characters encoded as two or more combined code points as a single
> char or not.
Here is a test case that shows the problem (if your terminal can display
combining characters (xterm appears to work)):
SELECT U&'\00E9', char_length(U&'\00E9');
?column? | char_length
----------+-------------
é | 1
(1 row)
SELECT U&'\0065\0301', char_length(U&'\0065\0301');
?column? | char_length
----------+-------------
é | 2
(1 row)
In response to
Responses
pgsql-hackers by date
| Next: | From: Peter Eisentraut | Date: 2009-04-14 12:36:35 |
| Subject: Re: Unicode support |
| Previous: | From: Andrew Dunstan | Date: 2009-04-14 12:10:54 |
| Subject: Re: Unicode string literals versus the world |