Re: [GENERAL] trouble with to_char('L')

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2010-04-20 13:10:18
Message-ID: 201004201310.o3KDAIR27248@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Magnus Hagander wrote:
> > One new idea would be to set LC_CTYPE to UTF16/widechars unconditionally
> > on Win32 and then just convert that always to the server encoding with
> > win32_wchar_to_db_encoding(), instead of using the encoding from
> > LC_MONETARY to set LC_CTYPE and having to do double-conversion.
>
> So, hugely late, reviving this thread.
>
> Ideally, we should definitely consider doing that. Internally, Windows
> will do it in UTF16 anyway. So we're basically doing
> UTF16->db->UTF16->UTF8->db or something like that with this patch.
>
> But I'm unsure how that would work. We're talking about the output of
> localeconv(), right? I don't see a version of localeconv() that does
> wide chars anywhere. (You can't just set LC_CTYPE and use the regular
> function - Windows has a separate set of functions for dealing with
> UTF16).

I thought there was an LC_CTYPE for UTF16 that we could use without a
wide version of that function. If not, forget that idea.

> Looking at the patch, you're passing "item" to db_encoding_strdup()
> but it doesn't seem to be used anywhere. Leftover from previous
> experiments, or forgot to use it? Perhaps you intended for it to be in
> the error messages?

It originally was in the error message but can be removed. I have now
removed 'item' from my version of the patch.

> Also, won't this need special-casing for UTF8? Per comment in
> mbutils.c, wcstombs() doesn't work for UTF8 encodings - you need to
> use MultiByteToWideChar().

Well, we don't support UTF8 for any of the non-encoding locales, e.g.
monetary, numeric, so I never considered that we would support it. If
we did support it, we would have to _pick_ a locale that is <= 2 bytes
per character and use that, and then convert to UTF8, but what locale
would we pick? They could use a LC_TYPE that is <= 2 bytes and a
numeric that is UTF8, but I never suspected we would want to support
that, and we would need some logic to detect that case.

> I also note that we have char2wchar() already - we should perhaps just
> call that? Or will that use the wrong locale?

I see char2wchar() calling GetDatabaseEncoding() right away, which does
use the cached value for the server encoding, so I don't think it will
work. We can't use our existing routines to convert _from_ the current
encoding to wide characters (because our numeric encoding might not
match the server encoding). However, we can use existing code that
converts from wide to the server encoding, perhaps replacing
win32_wchar_to_db_encoding().

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2010-04-20 13:23:45 Re: [GENERAL] trouble with to_char('L')
Previous Message Devrim GÜNDÜZ 2010-04-20 12:11:55 Re: Help with tracking!

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2010-04-20 13:23:45 Re: [GENERAL] trouble with to_char('L')
Previous Message Jamie Strachan 2010-04-20 12:51:25 RPM script bug #5430