Skip site navigation (1) Skip section navigation (2)

Bug #659: lower()/upper() bug on ->multibyte<- DB

From: pgsql-bugs(at)postgresql(dot)org
To: pgsql-bugs(at)postgresql(dot)org
Subject: Bug #659: lower()/upper() bug on ->multibyte<- DB
Date: 2002-05-07 14:51:12
Message-ID: 20020507145112.BE39A476356@postgresql.org (view raw or flat)
Thread:
Lists: pgsql-bugspgsql-hackers
Michael Enke (michael(dot)enke(at)wincor-nixdorf(dot)com) reports a bug with a severity of 2
The lower the number the more severe it is.

Short Description
lower()/upper() bug on ->multibyte<- DB

Long Description
OS: Linux Kernel 2.4.4, PostgreSQL version 7.2.1
lower() and upper() doesn't work like expected for multibyte
databases. It is working fine for one-byte encoding.
The behaviour can be reproduced as follows:
at initdb: LC_CTYPE was set to de_DE
createdb -E UTF-8 name
export PGCLIENTENCODING=LATIN1
psql -U name
--------------------------------------------------
=> select lower('');  -- german umlaut A, capital
ERROR: Could not convert UTF-8 to ISO8859-1
-- I expected to see:  german umlaut a, lower case
--------------------------------------------------
=> select lower('');  -- german umlaut a, lower case
ERROR: Could not convert UTF-8 to ISO8859-1
-- I expected to see:  german umlaut a, lower case
--------------------------------------------------
=> select upper('');  -- it doesn't translate

-- I expected to see: 
--------------------------------------------------
=> select upper('');  -- this works fine

--------------------------------------------------

The same happens to  and  (O umlaut, U umlaut)

If you want to reproduce this and don't have / on your keyboard,
you can create a table with one column, type varchar(1) (on a MB DB).
create a file with following input:
ae is \u00e4
AE is \u00c4
from java use the command:
native2ascii -reverse -utf8 <this-file> <new-file>
In <new-file> you will see:
in the first line 2 bytes: A(with tilde on top) and Euro Symbol,
in the second line 2 byte: A(with tilde on top) and a dotted box
unset PGCLIENTENCODING, call psql:
insert into table values('<copy and paste first two bytes>');
insert into table values('<copy and paste second two bytes>');
export PGCLIENTENCODING=LATIN1
psql: select * from table; will show you the a-umlaut and A-umlaut.

Sample Code


No file was uploaded with this report


Responses

pgsql-hackers by date

Next:From: mlwDate: 2002-05-07 14:58:29
Subject: Re: OK, lets talk portability.
Previous:From: Marc G. FournierDate: 2002-05-07 14:50:50
Subject: Re: OK, lets talk portability.

pgsql-bugs by date

Next:From: Stephan SzaboDate: 2002-05-07 15:55:48
Subject: Re: problem with the sum function
Previous:From: Tom LaneDate: 2002-05-07 14:37:56
Subject: Re: problem with the sum function

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group