Corruption of multibyte identifiers on UTF-8 locale

From: Victor Snezhko <snezhko(at)indorsoft(dot)ru>
To: pgsql-bugs(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Corruption of multibyte identifiers on UTF-8 locale
Date: 2006-09-23 10:23:52
Message-ID: u4puynao7.fsf@indorsoft.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello,

Looks like we have more serious problem with multibyte identifiers.
When I run the following sequence of queries:

CREATE OR REPLACE FUNCTION CreateOrAlterTable()
RETURNS int
AS $$
BEGIN
if not EXISTS(SELECT relname FROM pg_class WHERE relname ILIKE 'т1' AND relkind = 'r') then
CREATE TABLE т1 (
к1 int NOT NULL,
PRIMARY KEY (к1)
);
end if;
return 0;
END;
$$ LANGUAGE plpgsql;

SELECT CreateOrAlterTable();

CREATE OR REPLACE FUNCTION CreateOrAlterTable()
RETURNS int
AS $$
BEGIN
if not EXISTS(SELECT relname FROM pg_class WHERE relname ILIKE 'т2' AND relkind = 'r') then
CREATE TABLE т2 (
к2 int NOT NULL,
PRIMARY KEY (к2)
);
end if;
return 0;
END;
$$ LANGUAGE plpgsql;

and then try to create the second table:

SELECT CreateOrAlterTable();

, this gives me the following error (on HEAD as well as patched 8.1.4):

ERROR: invalid byte sequence for encoding "UTF8": 0xf18231
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
CONTEXT: SQL statement "SELECT not EXISTS(SELECT relname FROM pg_class WHERE relname ILIKE '?1' AND relkind = 'r')"
PL/pgSQL function "createoraltertable" line 2 at if

correct utf-8 byte sequence is 0xd18231, so it looks like we call
tolower() somewhere on parts of multibyte characters, and it does the
same as isspace() - it interprets it's argument as wide character, and
converts it.

simple create tables work, as well as create tables which are called
inside a procedure without "IF EXISTS" check.

So, we either don't support utf-8 on BSDs (BTW, this needs to be
checked on less popular BSD flavors) for now, or we need to fix this
somehow. E.g., by calling only wide-character checks, which will
complicate things...

--
WBR, Victor V. Snezhko
E-mail: snezhko(at)indorsoft(dot)ru

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Victor Snezhko 2006-09-23 16:02:59 Re: Corruption of multibyte identifiers on UTF-8 locale
Previous Message Victor Snezhko 2006-09-23 09:59:04 Re: BUG #1931: ILIKE and LIKE fails on Turkish locale