Re: UPPER()/LOWER() and UTF-8

From: Alexey Mahotkin <alexm(at)w-m(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: UPPER()/LOWER() and UTF-8
Date: 2003-11-05 15:11:05
Message-ID: 873cd2g8ae.fsf@dim.w-m.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>>> "TL" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

TL> writes: upper/lower aren't
TL> going to work desirably in any multi-byte character set
TL> encoding.

>> Can you please point me at their implementation? I do not
>> understand why that's impossible.

TL> Because they use <ctype.h>'s toupper() and tolower()
TL> functions, which only work on single-byte characters.

Aha, that's in src/backend/utils/adt/formatting.c, right?

Yes, I see, it goes byte by byte and uses toupper(). I believe we
could look at the locale, and if it is UTF-8, then use (or copy)
e.g. g_utf8_strup/strdown, right?

http://developer.gnome.org/doc/API/2.0/glib/glib-Unicode-Manipulation.html#g-utf8-strup

I belive that patch could be written in a matter of hours.

TL> There has been some discussion of using <wctype.h> where
TL> available, but this has a number of issues, notably figuring
TL> out the correct mapping from the server string encoding (eg
TL> UTF-8) to unpacked wide characters. At minimum we'd need to
TL> know which charset the locale setting is expecting, and there
TL> doesn't seem to be a portable way to find that out.

TL> IIRC, Peter thinks we must abandon use of libc's locale
TL> functionality altogether and write our own locale layer before
TL> we can really have all the locale-specific functionality we
TL> want.

I believe that native Unicode strings (together with human language
handling) should be introduced as (almost) separate data type (which
have nothing to do with locale), but that's bluesky maybe.

--alexm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2003-11-05 15:13:44 Re: weird regression test issue CVS HEAD
Previous Message Tom Lane 2003-11-05 15:09:01 Re: Experimental patch for inter-page delay in VACUUM