Tatsuo Ishii wrote:
> [Cc:ed to hackers]
> (trying select convert(lower(convert('X', 'LATIN1')),'LATIN1','UNICODE');)
> > Ok, this is working now (I cann't reproduce why not at the first time).
> > Is it planned to implement it so that I can write lower()/ upper() for multibyte
> > according to SQL standard (without convert)?
> SQL standard? The SQL standard says nothing about locale. So making
> lower() (and others) "locale aware" is far different from the SQL
> standard of point of view. Of course this does not mean "locale
> support" is should not be a part of PostgreSQL's implementation of
> SQL. However, we should be aware the limitation of "locale support"
> (as well as multibyte support). They are just the stopgap util CREATE
> CHARACTER SET etc. is implemnted IMO.
> > I could do it if you tell me where the final tolower()/toupper() happens.
> > (but not before middle of June).
> For the short term solution making convert() hiding from users might
> be a good idea (what I mean here is kind of auto execution of
> convert()). The hardest part is there's no idea how we could find a
> relationship bewteen particular locale and the encoding. For example,
> you know that for de_DE locale using LATIN1 encoding is appropreate,
> but PostgreSQL does not.
I think it is really not hard to do this for UTF-8. I don't have to know the
relation between the locale and the encoding. Look at this:
We can use the LC_CTYPE from pg_controldata or alternatively the LC_CTYPE
at server startup. For nearly every locale (de_DE, ja_JP, ...) there exists
also a locale *.utf8 (de_DE.utf8, ja_JP.utf8, ...) at least for the actual Linux glibc.
We don't need to know more than this. If we call
setlocale(LC_CTYPE, <value of LC_CTYPE extended with .utf8 if not already given>)
then glibc is aware of doing all the conversions. I attach a small demo program
which set the locale ja_JP.utf8 and is able to translate german umlaut A (upper) to
german umlaut a (lower).
What I don't know (have to ask a glibc delveloper) is:
Why there exists dozens of locales *.utf8 and what is the difference
between all /usr/lib/locale/*.utf8/LC_CTYPE?
But for all existing locales *.utf8, the conversion of german umlauts is working properly.
PS: I'm not in my office for the next 3 weeks and therefore not able to read my mails.
Description: text/plain (1.8 KB)
In response to
pgsql-hackers by date
|Next:||From: D'Arcy J.M. Cain||Date: 2002-05-13 10:21:19|
|Subject: Re: Further info : Very high load average but no cpu utilization ?|
|Previous:||From: Christopher Kings-Lynne||Date: 2002-05-13 05:14:46|
|Subject: Re: TRUNCATE|
pgsql-bugs by date
|Next:||From: pgsql-bugs||Date: 2002-05-13 11:13:57|
|Subject: Bug #667: Lib needed when install rpm|
|Previous:||From: Tom Lane||Date: 2002-05-13 03:56:44|
|Subject: Re: Bug #666: vacuum dies when called from plpgsql after large delete |