Quick Links

Re: Corruption of multibyte identifiers on UTF-8 locale

From:	Victor Snezhko <snezhko(at)indorsoft(dot)ru>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-bugs(at)postgresql(dot)org
Subject:	Re: Corruption of multibyte identifiers on UTF-8 locale
Date:	2006-09-23 17:33:47
Message-ID:	uu02ylc78.fsf@indorsoft.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

>> correct utf-8 byte sequence is 0xd18231, so it looks like we call
>> tolower() somewhere on parts of multibyte characters, and it does the
>> same as isspace() - it interprets it's argument as wide character, and
>> converts it.
>
> Indeed, and I am certainly wondering why we should not just say that
> you've got a broken locale definition there. There is absolutely no
> doubt that the ctype.h functions are defined to work on char, not
> wchar.

Agreed, but such corruption indicates that there is non-multibyte-safe
(octet-wise) case conversion somewhere, at best (with fully working
locale) it will cause case conversion to do nothing instead of actual
conversion.

> They have no business mangling high-bit-set bytes in a multibyte
> encoding.

--
WBR, Victor V. Snezhko
E-mail: snezhko(at)indorsoft(dot)ru

In response to

Re: Corruption of multibyte identifiers on UTF-8 locale at 2006-09-23 16:36:41 from Tom Lane

Responses

Re: Corruption of multibyte identifiers on UTF-8 locale at 2006-09-23 17:44:29 from Tom Lane

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	postgresql_atomicity	2006-09-23 17:43:36	BUG #2647: Atomicity issues when using rules
Previous Message	Tom Lane	2006-09-23 16:36:41	Re: Corruption of multibyte identifiers on UTF-8 locale