Skip site navigation (1) Skip section navigation (2)

Re: Corruption of multibyte identifiers on UTF-8 locale

From: Victor Snezhko <snezhko(at)indorsoft(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Corruption of multibyte identifiers on UTF-8 locale
Date: 2006-09-23 18:34:50
Message-ID: uhcyyl9dh.fsf@indorsoft.ru (view raw or flat)
Thread:
Lists: pgsql-bugs
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

>> Agreed, but such corruption indicates that there is non-multibyte-safe
>> (octet-wise) case conversion somewhere, at best (with fully working
>> locale) it will cause case conversion to do nothing instead of actual
>> conversion.
>
> Yours is the first installation I've heard of that fails to get this
> right, and the code in question (downcase_truncate_identifier) has
> been like that since PG 7.4.something ...

This code from downcase_truncate_identifier():

	else if (ch >= 0x80 && isupper(ch))
		ch = tolower(ch);

just can't work on multibyte encodings unless tolower can magically
guess what unicode symbol it operates on (having only one octet of
it). On my (ok, broken) locale definition it corrupts multibyte
characters, on working locale defs it must fail to downcase
identifiers. Unless I'm again missing something obvious...

But, from the comment above:
 * SQL99 specifies Unicode-aware case normalization, which we don't yet 
 * have the infrastructure for.

OK, a lot of work is required to fix it, I see. Are there any plans to
either switch to wide-char strings or do a per-character (unlike
per-octet) processing?

-- 
WBR, Victor V. Snezhko
E-mail: snezhko(at)indorsoft(dot)ru



In response to

pgsql-bugs by date

Next:From: Tom LaneDate: 2006-09-23 22:17:44
Subject: Re: Out of memory error during large hashagg
Previous:From: Tom LaneDate: 2006-09-23 17:44:29
Subject: Re: Corruption of multibyte identifiers on UTF-8 locale

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group