Bug #659: lower()/upper() bug on ->multibyte<- DB

From: pgsql-bugs(at)postgresql(dot)org
To: pgsql-bugs(at)postgresql(dot)org
Subject: Bug #659: lower()/upper() bug on ->multibyte<- DB
Date: 2002-05-07 14:51:12
Message-ID: 20020507145112.BE39A476356@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Michael Enke (michael(dot)enke(at)wincor-nixdorf(dot)com) reports a bug with a severity of 2
The lower the number the more severe it is.

Short Description
lower()/upper() bug on ->multibyte<- DB

Long Description
OS: Linux Kernel 2.4.4, PostgreSQL version 7.2.1
lower() and upper() doesn't work like expected for multibyte
databases. It is working fine for one-byte encoding.
The behaviour can be reproduced as follows:
at initdb: LC_CTYPE was set to de_DE
createdb -E UTF-8 name
export PGCLIENTENCODING=LATIN1
psql -U name
--------------------------------------------------
=> select lower(''); -- german umlaut A, capital
ERROR: Could not convert UTF-8 to ISO8859-1
-- I expected to see: german umlaut a, lower case
--------------------------------------------------
=> select lower(''); -- german umlaut a, lower case
ERROR: Could not convert UTF-8 to ISO8859-1
-- I expected to see: german umlaut a, lower case
--------------------------------------------------
=> select upper(''); -- it doesn't translate

-- I expected to see:
--------------------------------------------------
=> select upper(''); -- this works fine

--------------------------------------------------

The same happens to and (O umlaut, U umlaut)

If you want to reproduce this and don't have / on your keyboard,
you can create a table with one column, type varchar(1) (on a MB DB).
create a file with following input:
ae is \u00e4
AE is \u00c4
from java use the command:
native2ascii -reverse -utf8 <this-file> <new-file>
In <new-file> you will see:
in the first line 2 bytes: A(with tilde on top) and Euro Symbol,
in the second line 2 byte: A(with tilde on top) and a dotted box
unset PGCLIENTENCODING, call psql:
insert into table values('<copy and paste first two bytes>');
insert into table values('<copy and paste second two bytes>');
export PGCLIENTENCODING=LATIN1
psql: select * from table; will show you the a-umlaut and A-umlaut.

Sample Code

No file was uploaded with this report

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Stephan Szabo 2002-05-07 15:55:48 Re: problem with the sum function
Previous Message Tom Lane 2002-05-07 14:37:56 Re: problem with the sum function

Browse pgsql-hackers by date

  From Date Subject
Next Message mlw 2002-05-07 14:58:29 Re: OK, lets talk portability.
Previous Message Marc G. Fournier 2002-05-07 14:50:50 Re: OK, lets talk portability.