Skip site navigation (1) Skip section navigation (2)

Re: Bug #659: lower()/upper() bug on ->multibyte<- DB

From: "Enke, Michael" <michael(dot)enke(at)wincor-nixdorf(dot)com>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug #659: lower()/upper() bug on ->multibyte<- DB
Date: 2002-05-13 09:57:21
Message-ID: 3CDF8E01.DC0B2817@wincor-nixdorf.com (view raw or flat)
Thread:
Lists: pgsql-bugspgsql-hackers
Tatsuo Ishii wrote:
> 
> [Cc:ed to hackers]
> 
> (trying select convert(lower(convert('X', 'LATIN1')),'LATIN1','UNICODE');)
> 
> > Ok, this is working now (I cann't reproduce why not at the first time).
> 
> Good.
> 
> > Is it planned to implement it so that I can write lower()/ upper() for multibyte
> > according to SQL standard (without convert)?
> 
> SQL standard? The SQL standard says nothing about locale. So making
> lower() (and others) "locale aware" is far different from the SQL
> standard of point of view. Of course this does not mean "locale
> support" is should not be a part of PostgreSQL's implementation of
> SQL. However, we should be aware the limitation of "locale support"
> (as well as multibyte support). They are just the stopgap util CREATE
> CHARACTER SET etc. is implemnted IMO.
> 
> > I could do it if you tell me where the final tolower()/toupper() happens.
> > (but not before middle of June).
> 
> For the short term solution making convert() hiding from users might
> be a good idea (what I mean here is kind of auto execution of
> convert()). The hardest part is there's no idea how we could find a
> relationship bewteen particular locale and the encoding. For example,
> you know that for de_DE locale using LATIN1 encoding is appropreate,
> but PostgreSQL does not.

I think it is really not hard to do this for UTF-8. I don't have to know the
relation between the locale and the encoding. Look at this:
We can use the LC_CTYPE from pg_controldata or alternatively the LC_CTYPE
at server startup. For nearly every locale (de_DE, ja_JP, ...) there exists
also a locale *.utf8 (de_DE.utf8, ja_JP.utf8, ...) at least for the actual Linux glibc.
We don't need to know more than this. If we call
setlocale(LC_CTYPE, <value of LC_CTYPE extended with .utf8 if not already given>)
then glibc is aware of doing all the conversions. I attach a small demo program
which set the locale ja_JP.utf8 and is able to translate german umlaut A (upper) to
german umlaut a (lower).
What I don't know (have to ask a glibc delveloper) is:
Why there exists dozens of locales *.utf8 and what is the difference
between all /usr/lib/locale/*.utf8/LC_CTYPE?
But for all existing locales *.utf8, the conversion of german umlauts is working properly.

Regards,
Michael

PS: I'm not in my office for the next 3 weeks and therefore not able to read my mails.


Attachment: mb.c
Description: text/plain (1.8 KB)

In response to

Responses

pgsql-hackers by date

Next:From: D'Arcy J.M. CainDate: 2002-05-13 10:21:19
Subject: Re: Further info : Very high load average but no cpu utilization ?
Previous:From: Christopher Kings-LynneDate: 2002-05-13 05:14:46
Subject: Re: TRUNCATE

pgsql-bugs by date

Next:From: pgsql-bugsDate: 2002-05-13 11:13:57
Subject: Bug #667: Lib needed when install rpm
Previous:From: Tom LaneDate: 2002-05-13 03:56:44
Subject: Re: Bug #666: vacuum dies when called from plpgsql after large delete

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group