Re: BUG #5661: The character encoding in logfile is confusing.

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: tkbysh2000(at)yahoo(dot)co(dot)jp
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5661: The character encoding in logfile is confusing.
Date: 2010-09-18 02:17:29
Message-ID: 4C942139.4090706@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On 09/17/2010 01:10 PM, tkbysh2000(at)yahoo(dot)co(dot)jp wrote:

> BTW, I found third character encoding in the file, Shift_JIS. Attached
> file is including all of 3 character encoded lines.
> For your reference:
> Shift_JIS: Default encoding of Japanese Windows. I found this problem
> on posgre server which is working as Windows service.
> EUC_JP: Very major encoding of Japanese Unix. I guess that the
> developper which worked for this, on some Unix or Linux.
> UTF-8: Major encoding especially ralating java in Japan. And I
> specified as default encoding for my all of databases.

Thanks for that.

> I didn't edit the log file to avoid change some data by text editor when
> save it. So attached log file is including from start to end a service.
> But the log file is very small. Total size is 7kb.

Good plan. Thanks.

> And client code is not attached. Cause the messages of bad character
> encoding are relevant to starting up and shutting down messages.
> So you can find easily this problem. They are in top and end of log
> file.

Yes, the mismatched encodings in the data are clear and obvious.

Given that the messages are coming purely from postgresql, not client
code, I'm now wondering if what we're dealing with is mismatched
encodings in the translation files, where some messages were translated
with a different encoding to other messages.

One of the correctly encoded messages is "Unexpected EOF received on
client connection"

One of the incorrectly encoded (shift-JIS) messages is: "Fast Shutdown
request received". Another is "Aborting any active transactions".

I can find the correctly encoded messages in
share/locale/ja/LC_MESSAGES/postgres-9.0.mo

The incorrectly encoded messages appear in the same file, but are
encoded in utf-8 in that file despite being output to the logs in
shift-JIS. For example, with the badly encoded data from the logs
extracted into the file 'x':

$ python
>>> x = open("x").read()
>>> x
'\x8d\x82\x91\xac\x83V\x83\x83\x83b\x83g\x83_\x83E\x83\x93\x97v\x8b\x81\x82\xf0\x8e\xf3\x82\xaf\x8e\xe6\x82\xe8\x82\xdc\x82\xb5\x82\xbd\r\n'
>>> print x.decode("shift-jis")
高速シャットダウン要求を受け取りました

$ grep '高速シャットダウン要求を受け取りました' *
Binary file postgres-9.0.mo matches
$

So - either something in the pipeline is "helpfully" converting your
error messages, or your locale files aren't the same as mine. I doubt
the latter; it seems almost impossible that just a few messages would be
converted to shift-JIS by accident in the Windows release only. So the
question now is where the messages are converted from UTF-8 to shift-JIS
and why that conversion is being applied inconsistently.

I'll try to have a look and see what I can find.

--
Craig Ringer

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message saera87 2010-09-19 09:41:29 BUG #5662: Incomplete view
Previous Message Bruce Momjian 2010-09-18 00:58:38 Re: BUG #5660: Can't start db service if specify effective_io_concurrency

Browse pgsql-hackers by date

  From Date Subject
Next Message Itagaki Takahiro 2010-09-18 02:28:06 Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Previous Message fazool mein 2010-09-18 02:05:14 Re: Heartbeat between Primary and Standby replicas