Re: Bug #659: lower()/upper() bug on ->multibyte<- DB

From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: michael(dot)enke(at)wincor-nixdorf(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Bug #659: lower()/upper() bug on ->multibyte<- DB
Date: 2002-05-09 01:06:13
Message-ID: 20020509100613P.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

> > You input "select lower('X')" as ISO-8859-1 encoded, then it is sent
> > to the backend. The backend convert it to UTF-8. Then lower() is
> > called with an UTF-8 string input. lower() calls tolower() which
> > expects the input being ISO-8859-1 since you set locale to de_DE.
> > This is the source of the problem.
>
> Excuse me, this seems not the be the source of the problem.
> If I call select lower(table_col) from table;
> then I also don't get back the lower case character but the original case if it is a multibyte char.

This doesn't work by the same reason above. The backend extracts
table_col from the table which is encoded in UTF-8, while lower()
expects ISO-8859-1. Try:

select convert(lower(convert(table_col, 'LATIN1')),'LATIN1','UNICODE')
from your_table;

> I did now also remove all below data directory, exported LC_CTYPE to de_DE.utf8, made an initdb.
> With pg_controldata I see LC_CTYPE is de_DE.utf8
> Now I no longer get the ERROR: cannot convert UTF-8 to ISO8859-1, but the translation doesn't work:
> MB chars are not translated, I get back the original case.

I don't think using de_DE.utf8 helps. The locale support just calls
tolower(), which is not be able to handle multibyte chars.

> > Oops. That should be:
> >
> > select convert(lower(convert('X', 'LATIN1')),'LATIN1','UNICODE');
> > It looks ugly, but works.
>
> Sorry, it doesn't work. The same here, I get back the case I put in at X, not the lower case.

Are you sure to use de_DE locale (not de_DE.utf8)?
Included are sample scripts being work with me using de_DE locale.
Here is also my pg_controldata output.

$ pg_controldata
pg_control version number: 71
Catalog version number: 200201121
Database state: IN_PRODUCTION
pg_control last modified: Thu May 9 08:37:20 2002
Current log file id: 0
Next log file segment: 1
Latest checkpoint location: 0/18C860
Prior checkpoint location: 0/1503A0
Latest checkpoint's REDO location: 0/172054
Latest checkpoint's UNDO location: 0/0
Latest checkpoint's StartUpID: 8
Latest checkpoint's NextXID: 217
Latest checkpoint's NextOID: 24748
Time of latest checkpoint: Thu May 9 08:37:17 2002
Database block size: 8192
Blocks per segment of large relation: 131072
LC_COLLATE: de_DE
LC_CTYPE: de_DE
--
Tatsuo Ishii

Attachment Content-Type Size
bbb text/plain 422 bytes

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tatsuo Ishii 2002-05-09 01:27:01 Re: Bug #659: lower()/upper() bug on ->multibyte<- DB
Previous Message Enke, Michael 2002-05-08 14:54:55 Re: Bug #659: lower()/upper() bug on ->multibyte<- DB

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2002-05-09 01:27:01 Re: Bug #659: lower()/upper() bug on ->multibyte<- DB
Previous Message Matthew Kirkwood 2002-05-09 00:25:40 Re: HEADS UP: Win32/OS2/BeOS native ports