Re: Bug #659: lower()/upper() bug on ->multibyte<- DB

From: "Enke, Michael" <michael(dot)enke(at)wincor-nixdorf(dot)com>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug #659: lower()/upper() bug on ->multibyte<- DB
Date: 2002-05-13 09:57:21
Message-ID: 3CDF8E01.DC0B2817@wincor-nixdorf.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Tatsuo Ishii wrote:
>
> [Cc:ed to hackers]
>
> (trying select convert(lower(convert('X', 'LATIN1')),'LATIN1','UNICODE');)
>
> > Ok, this is working now (I cann't reproduce why not at the first time).
>
> Good.
>
> > Is it planned to implement it so that I can write lower()/ upper() for multibyte
> > according to SQL standard (without convert)?
>
> SQL standard? The SQL standard says nothing about locale. So making
> lower() (and others) "locale aware" is far different from the SQL
> standard of point of view. Of course this does not mean "locale
> support" is should not be a part of PostgreSQL's implementation of
> SQL. However, we should be aware the limitation of "locale support"
> (as well as multibyte support). They are just the stopgap util CREATE
> CHARACTER SET etc. is implemnted IMO.
>
> > I could do it if you tell me where the final tolower()/toupper() happens.
> > (but not before middle of June).
>
> For the short term solution making convert() hiding from users might
> be a good idea (what I mean here is kind of auto execution of
> convert()). The hardest part is there's no idea how we could find a
> relationship bewteen particular locale and the encoding. For example,
> you know that for de_DE locale using LATIN1 encoding is appropreate,
> but PostgreSQL does not.

I think it is really not hard to do this for UTF-8. I don't have to know the
relation between the locale and the encoding. Look at this:
We can use the LC_CTYPE from pg_controldata or alternatively the LC_CTYPE
at server startup. For nearly every locale (de_DE, ja_JP, ...) there exists
also a locale *.utf8 (de_DE.utf8, ja_JP.utf8, ...) at least for the actual Linux glibc.
We don't need to know more than this. If we call
setlocale(LC_CTYPE, <value of LC_CTYPE extended with .utf8 if not already given>)
then glibc is aware of doing all the conversions. I attach a small demo program
which set the locale ja_JP.utf8 and is able to translate german umlaut A (upper) to
german umlaut a (lower).
What I don't know (have to ask a glibc delveloper) is:
Why there exists dozens of locales *.utf8 and what is the difference
between all /usr/lib/locale/*.utf8/LC_CTYPE?
But for all existing locales *.utf8, the conversion of german umlauts is working properly.

Regards,
Michael

PS: I'm not in my office for the next 3 weeks and therefore not able to read my mails.

Attachment Content-Type Size
mb.c text/plain 1.8 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message pgsql-bugs 2002-05-13 11:13:57 Bug #667: Lib needed when install rpm
Previous Message Tom Lane 2002-05-13 03:56:44 Re: Bug #666: vacuum dies when called from plpgsql after large delete

Browse pgsql-hackers by date

  From Date Subject
Next Message D'Arcy J.M. Cain 2002-05-13 10:21:19 Re: Further info : Very high load average but no cpu utilization ?
Previous Message Christopher Kings-Lynne 2002-05-13 05:14:46 Re: TRUNCATE