From: | Victor Snezhko <snezhko(at)indorsoft(dot)ru> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: Corruption of multibyte identifiers on UTF-8 locale |
Date: | 2006-09-23 17:33:47 |
Message-ID: | uu02ylc78.fsf@indorsoft.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> correct utf-8 byte sequence is 0xd18231, so it looks like we call
>> tolower() somewhere on parts of multibyte characters, and it does the
>> same as isspace() - it interprets it's argument as wide character, and
>> converts it.
>
> Indeed, and I am certainly wondering why we should not just say that
> you've got a broken locale definition there. There is absolutely no
> doubt that the ctype.h functions are defined to work on char, not
> wchar.
Agreed, but such corruption indicates that there is non-multibyte-safe
(octet-wise) case conversion somewhere, at best (with fully working
locale) it will cause case conversion to do nothing instead of actual
conversion.
> They have no business mangling high-bit-set bytes in a multibyte
> encoding.
--
WBR, Victor V. Snezhko
E-mail: snezhko(at)indorsoft(dot)ru
From | Date | Subject | |
---|---|---|---|
Next Message | postgresql_atomicity | 2006-09-23 17:43:36 | BUG #2647: Atomicity issues when using rules |
Previous Message | Tom Lane | 2006-09-23 16:36:41 | Re: Corruption of multibyte identifiers on UTF-8 locale |