Re: Encoding issues in console and eventlog on win32

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Encoding issues in console and eventlog on win32
Date: 2009-10-10 05:54:25
Message-ID: 9837222c0910092254i423f1394m790e076716246bae@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2009/10/7 Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>:
>
> Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> Per your own comments earlier, and in the code, what will happen if
>> pg_do_encoding_conversion() calls ereport()? Didn't you say we need a
>> non-throwing version of it?
>
> We are hard to use encoding conversion functions in logging routines
> because they could throw errors if there are some unconvertable characters.
> Non-throwing version will convert such characters into '?' or escaped form
> (something like \888 or \xFF). If there where such infrastructure, we can
> support "log_encoding" settings and convert messages in platform-dependent
> encoding before writing to syslog or console.

Right, which we don't have at this point. That would be very useful on
unix, i believe.

>> pgwin32_toUTF16() needs error checking on the API calls, and needs to
>> do something reasonable if it fails.
>
> Now it returns NULL and caller writes messages in the original encoding.

Seems reasonable. If encoding fails, I think that's the best we can do.

> Also I added the following error checks before calling pgwin32_toUTF16()
>    (errordata_stack_depth < ERRORDATA_STACK_SIZE - 1)
> to avoid recursive errors, but I'm not sure it is really meaningful.
> Please remove or rewrite this part if it is not a right way.

I'm not entirely sure either, but it looks like it could protect us
from getting into a tight loop on an error here.. Tom (or someone else
who knows that for sure :P),comments?

>> The encoding_to_codepage array needs to go in encnames.c, where other
>> such tables are. Perhaps it can even be integrated in pg_enc2name_tbl
>> as a separate field?
>
> I added pg_enc2name.codepage. Note that this field is needed only
> on Windows, but now exported for all platforms. If you don't like
> the useless field, the following macro could be a help.
> #ifdef WIN32
> #define def_enc2name(name, codepage)    { #name, PG_##name, codepage }
> #else
> #define def_enc2name(name, codepage)    { #name, PG_##name }
> #endif
> pg_enc2name pg_enc2name_tbl[] =
> {
>    def_enc2name(SQL_ASCII),
>    def_enc2name(EUC_JP),
>    ...

Yeah, I think that makes sense. It's not much data, but it's
completely unnecessary :-) I can make that change at commit.

One other question - you note that WriteConsoleW() "could fail if
stderr is redirected". Are you saying that it will always fail when
stderr is redirected, or only sometimes? If ony sometimes, do you know
under which conditions it happens?

If it's always, I assume this just means that the logfile will be in
the database encoding and not in UTF16? Is this what we want, or would
we like the logfile to also be in UTF16? If we can convert it to
UTF16, that would fix the case when you have different databases in
different encodings, wouldn't it? (Even if your editor, unlike the
console subsystem, can view the individual encoding you need, I bet it
can't deal with multiple encodings in the same file)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jaime Casanova 2009-10-10 07:45:35 Re: GRANT ON ALL IN schema
Previous Message Tom Lane 2009-10-10 02:01:07 Re: Using results from INSERT ... RETURNING