Re: [GENERAL] trouble with to_char('L')

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2010-03-22 20:14:53
Message-ID: 201003222014.o2MKErr17486@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Takahiro Itagaki wrote:
>
> Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
> > Takahiro Itagaki wrote:
> > > Since 9.0 has GetPlatformEncoding() for the purpose, we could simplify
> > > db_encoding_strdup() with the function. Like this:
> >
> > OK, I don't have any Win32 people testing this patch so if we want this
> > fixed for 9.0 someone is going to have to test my patch to see that it
> > works. Can you make the adjustments suggested above to my patch and
> > test it to see that it works so we can apply it for 9.0?
>
> Here is a full patch that can be applied cleanly to HEAD.
> Can anyone test it on Windows?
>
> I'm not sure why temporary changes of lc_ctype was required in the
> original patch. The codes are not included in my patch, but please
> notice me it is still needed.

Sorry for the delay in replying to you.

I considered your idea of using the existing Postgres encoding
conversion routines to do the conversion of localenv() strings, but
found two problems.

First, GetPlatformEncoding() caches its result, so it assumes the
LC_CTYPE never changes for the server, while fixing this issue actually
requires us to change LC_CTYPE. We could avoid the caching but that
then involves complex table lookups, etc, which seems overly complex:

+ /* convert the string to the database encoding */
+ pstr = (char *) pg_do_encoding_conversion(
+ (unsigned char *) str, strlen(str),
+ GetPlatformEncoding(), GetDatabaseEncoding());

Second, having our backend routines do the conversion seems wrong
because it is possible for someone to set LC_MONETARY to an encoding
that our database does not understand, e.g. UTF16, but one that WIN32
can convert to a valid encoding.

The reason we are doing all this is because of this updated comment in
my patch:

ftp://momjian.us/pub/postgresql/mypatches/pg_locale

+ * Ideally, monetary and numeric local symbols could be returned in
+ * any server encoding. Unfortunately, the WIN32 API does not allow
+ * setlocale() to return values in a codepage/CTYPE that uses more
+ * than two bytes per character, like UTF-8:
+ *
+ * http://msdn.microsoft.com/en-us/library/x99tb11d.aspx
+ *
+ * Evidently, LC_CTYPE allows us to control the encoding used
+ * for strings returned by localeconv(). The Open Group
+ * standard, mentioned at the top of this C file, doesn't
+ * explicitly state this.
+ *
+ * Therefore, we set LC_CTYPE to match LC_NUMERIC and
+ * LC_MONETARY, call localeconv(), and use mbstowcs() to
+ * convert the locale-aware string, e.g. Euro symbol (which
+ * is not in UTF-8), to the server encoding.

One new idea would be to set LC_CTYPE to UTF16/widechars unconditionally
on Win32 and then just convert that always to the server encoding with
win32_wchar_to_db_encoding(), instead of using the encoding from
LC_MONETARY to set LC_CTYPE and having to do double-conversion.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tony Wasson 2010-03-22 23:57:08 Re: How to dump JUST procedures/funnctions?
Previous Message Dimitri Fontaine 2010-03-22 19:48:51 Re: pgreplay log file replayer released

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2010-03-22 20:29:21 Re: [postgis-users] ERROR: array size exceeds themaximumallowed(134217727)
Previous Message Kevin Grittner 2010-03-22 20:10:31 Re: Comments on Exclusion Constraints and related datatypes