Quick Links

Re: Corruption of multibyte identifiers on UTF-8 locale

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Victor Snezhko <snezhko(at)indorsoft(dot)ru>
Cc:	pgsql-bugs(at)postgresql(dot)org
Subject:	Re: Corruption of multibyte identifiers on UTF-8 locale
Date:	2006-09-23 16:36:41
Message-ID:	25540.1159029401@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

Victor Snezhko <snezhko(at)indorsoft(dot)ru> writes:
> correct utf-8 byte sequence is 0xd18231, so it looks like we call
> tolower() somewhere on parts of multibyte characters, and it does the
> same as isspace() - it interprets it's argument as wide character, and
> converts it.

Indeed, and I am certainly wondering why we should not just say that
you've got a broken locale definition there. There is absolutely no
doubt that the ctype.h functions are defined to work on char, not wchar.
They have no business mangling high-bit-set bytes in a multibyte
encoding.

regards, tom lane

In response to

Corruption of multibyte identifiers on UTF-8 locale at 2006-09-23 10:23:52 from Victor Snezhko

Responses

Re: Corruption of multibyte identifiers on UTF-8 locale at 2006-09-23 17:33:47 from Victor Snezhko

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Victor Snezhko	2006-09-23 17:33:47	Re: Corruption of multibyte identifiers on UTF-8 locale
Previous Message	Victor Snezhko	2006-09-23 16:02:59	Re: Corruption of multibyte identifiers on UTF-8 locale