Re: Encoding issues in console and eventlog on win32

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Encoding issues in console and eventlog on win32
Date: 2009-09-14 11:08:14
Message-ID: 4AAE241E.8010406@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Itagaki Takahiro wrote:
> We can choose different encodings from platform-dependent one
> for database, but postgres writes serverlogs in the database encoding.
> As the result, serverlogs are filled with broken characters.
>
> The problem could occur on all platforms, however, there is a solution
> for win32. Since Windows supports wide characters to write logs, we can
> convert log texts => UTF-8 => UTF-16 and pass them to WriteConsoleW()
> and ReportEventW().
>
> Especially in Japan, encoding troubles on Windows are unavoidable
> because postgres doesn't support Shift-JIS for database encoding,
> that is the native encoding for Windows Japanese edition.
>
> If we also want to support the same functionality on non-win32 platform,
> we might need non-throwable version of pg_do_encoding_conversion():
>
> log_message_to_write = pg_do_encoding_conversion_nothrow(
> log_message_in_database_encoding,
> GetDatabaseEncoding() /* as src_encoding */,
> GetPlatformEncoding() /* as dst_encoding */)
>
> and pass the result to stderr and syslog. But it requires major rewrites
> of conversion functions, so I'd like to submit a solution only for win32
> for now. Also, the issue is not so serious on non-win32 platforms because
> we can choose UTF-8 or EUC_* on those platforms.

Something like that seems reasonable for the Windows event log; that is
clearly supposed to be written using a specific encoding. With the log
files, we're more free to do what we want, and IMHO we shouldn't put a
Windows-specific hack there because as you say we have the same problem
on all platforms.

There's no guarantee that conversion to UTF-8 won't fail, so this isn't
totally risk-free on Windows either. Theoretically, MultiByteToWideChar
could fail too (the patch neglects to check for that), although I
suppose it can't really happen for UTF-8 -> UTF-16 conversion.

Can't we use MultiByteToWideChar() to convert directly to the required
encoding, avoiding the double conversion?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2009-09-14 11:24:45 Streaming Replication patch for CommitFest 2009-09
Previous Message Pierre Frédéric Caillaud 2009-09-14 10:22:22 Patch LWlocks instrumentation