Re: Bug #659: lower()/upper() bug on ->multibyte<- DB

From: "Enke, Michael" <michael(dot)enke(at)wincor-nixdorf(dot)com>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Bug #659: lower()/upper() bug on ->multibyte<- DB
Date: 2002-05-08 09:15:07
Message-ID: 3CD8EC9B.7519ECD6@wincor-nixdorf.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Hello,

> This is not a bug but an expected behavior. Locale support expects an
> input string is encoded in ISO-8859-1 (because you set locale to
> de_DE) while you supply UTF-8.

What is the difference between an insert of string and a call to a function with a string argument?
Insert works well, output also, only the functions lower(), upper() and initcap() make problems.
This is also ok: select a from a where a = 'X'; -- X is german umlaut a, lowercase / german umlaut A, capital

> Try an explicit encoding converion function:
>
> select lower(convert('D'), 'LATIN1');

I tried: select lower(convert('X'), 'LATIN1'); -- X is german umlaut A, capital
but the result was the same:
ERROR: Could not convert UTF-8 to ISO8859-1

I than compiled postgres without locale support. I created a DB with -E UTF-8
I created a table and inserted UTF-8 char "0x00C4" (german umlaut A, capital)
I called "select lower(a) from a;"
Now, without locale support, I didn't get the error but I also didn't get
the right result. The right result would be UTF-8 char "0x00E4" (german umlaut a, lower case)
!independent of the locale!

Regards,
Michael Enke

Tatsuo Ishii wrote:
>
> > Short Description
> > lower()/upper() bug on ->multibyte<- DB
> >
> > Long Description
> > OS: Linux Kernel 2.4.4, PostgreSQL version 7.2.1
> > lower() and upper() doesn't work like expected for multibyte
> > databases. It is working fine for one-byte encoding.
> > The behaviour can be reproduced as follows:
> > at initdb: LC_CTYPE was set to de_DE
> > createdb -E UTF-8 name
> > export PGCLIENTENCODING=LATIN1
> > psql -U name
> > --------------------------------------------------
> > => select lower('D'); -- german umlaut A, capital
> > ERROR: Could not convert UTF-8 to ISO8859-1
> > -- I expected to see: d german umlaut a, lower case
>
> This is not a bug but an expected behavior. Locale support expects an
> input string is encoded in ISO-8859-1 (because you set locale to
> de_DE) while you supply UTF-8. Try an explicit encoding converion
> function:
>
> select lower(convert('D'), 'LATIN1');
>
> Note that '\304' must be an actual german umlaut A, capital character,
> not an octal espcaped notion.
> --
> Tatsuo Ishii

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tatsuo Ishii 2002-05-08 12:30:01 Re: Bug #659: lower()/upper() bug on ->multibyte<- DB
Previous Message Thomas Lockhart 2002-05-08 03:55:17 Re: Bug #660: View name not stored in lowercase

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthew Kirkwood 2002-05-08 09:19:59 Re: HEADS UP: Win32/OS2/BeOS native ports
Previous Message Jean-Michel POURE 2002-05-08 09:05:40 Re: How much work is a native Windows application?