Skip site navigation (1) Skip section navigation (2)

Re: [HACKERS] Bug #659: lower()/upper() bug on

From: Hannu Krosing <hannu(at)tm(dot)ee>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: michael(dot)enke(at)wincor-nixdorf(dot)com, pgsql-bugs(at)postgresql(dot)org,pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Bug #659: lower()/upper() bug on
Date: 2002-05-14 08:35:44
Message-ID: 1021365344.2382.13.camel@taru.tm.ee (view raw or flat)
Thread:
Lists: pgsql-bugspgsql-hackers
On Tue, 2002-05-14 at 03:29, Tatsuo Ishii wrote:
> > I think it is really not hard to do this for UTF-8. I don't have to know the
> > relation between the locale and the encoding. Look at this:
> > We can use the LC_CTYPE from pg_controldata or alternatively the LC_CTYPE
> > at server startup. For nearly every locale (de_DE, ja_JP, ...) there exists
> > also a locale *.utf8 (de_DE.utf8, ja_JP.utf8, ...) at least for the actual Linux glibc.
> 
> My Linux box does not have *.utf8 locales at all. Probably not so many
> platforms have them up to now, I guess.

What linux do you use ?

At least newer Redhat Linuxen have them and I suspect that all newer
glibc's are capable of using them.

> 
> > We don't need to know more than this. If we call
> > setlocale(LC_CTYPE, <value of LC_CTYPE extended with .utf8 if not already given>)
> > then glibc is aware of doing all the conversions. I attach a small demo program
> > which set the locale ja_JP.utf8 and is able to translate german umlaut A (upper) to
> > german umlaut a (lower).
> 
> Interesting idea, but the problem is we have to decide to use exactly
> one locale before initdb. In my understanding, users willing to use
> Unicode (UTF-8) tend to use multiple languages. This is natural since
> Unicode claims it can handle several languages. For example, user
> might want to have a table like this in a UTF-8 database:
> 
> create table t1(
>        english text,	-- English message
>        germany text,	-- Germany message
>        japanese text	-- Japanese message
> );
> 
> If you have set the local to, say de_DE, then:
> 
> select lower(japanese) from t1;
>
> would be executed in de_DE.utf8 locale, and I doubt it produces any
> meaningfull results for Japanese.

IIRC it may, as I think that it will include full UTF8 upper/lower
tables, at least on Linux.

For example en_US will produce right upper/lower results for Estonian,
though collation is off and some chars are missing if using iso-8859-1.

btw, does Japanese language have distinct upper and lower case letters ?

--------------
Hannu



In response to

Responses

pgsql-hackers by date

Next:From: Lincoln YeohDate: 2002-05-14 10:42:07
Subject: Re: pg_dump DROP commands and implicit search paths
Previous:From: Hannu KrosingDate: 2002-05-14 08:21:02
Subject: Re: Discontent with development process (was:Re: pgaccess

pgsql-bugs by date

Next:From: Ewald GeschwindeDate: 2002-05-14 10:28:17
Subject: points and boxes - core dump
Previous:From: Tatsuo IshiiDate: 2002-05-14 07:52:55
Subject: Re: [HACKERS] Bug #659: lower()/upper() bug on

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group