Re: UTF-8

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Tomi NA <hefest(at)gmail(dot)com>, Martins Mihailovs <martins(dot)mihailovs(at)europrojects(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: UTF-8
Date: 2006-10-13 17:22:22
Message-ID: 3001.1160760142@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> Characters havn't fitted in an unsigned char in a very long time. It's
> obviously bogus for any multibyte encoding (the code even says so). For
> such encodings you could use the system's towupper() (ANSI C/Unix98)
> which will work on any unicode char.

http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/adt/oracle_compat.c?rev=1.67

* If the system provides the needed functions for wide-character manipulation
* (which are all standardized by C99), then we implement upper/lower/initcap
* using wide-character functions. Otherwise we use the traditional <ctype.h>
* functions, which of course will not work as desired in multibyte character
* sets. Note that in either case we are effectively assuming that the
* database character encoding matches the encoding implied by LC_CTYPE.

regards, tom lane

In response to

  • Re: UTF-8 at 2006-10-13 16:58:27 from Martijn van Oosterhout

Browse pgsql-general by date

  From Date Subject
Next Message Shane Ambler 2006-10-13 17:22:53 Re: Server Added Y'day. Missing Today
Previous Message Tom Lane 2006-10-13 17:16:36 Re: some log statements ignored