Re: BUG #5661: The character encoding in logfile is confusing.

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, tkbysh2000(at)yahoo(dot)co(dot)jp, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: BUG #5661: The character encoding in logfile is confusing.
Date: 2010-09-25 06:48:34
Message-ID: 4C9D9B42.5060007@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On 22/09/2010 9:41 PM, Tom Lane wrote:
> Craig Ringer<craig(at)postnewspapers(dot)com(dot)au> writes:
>> On 22/09/2010 5:45 PM, Peter Eisentraut wrote:
>>> We need to produce the log output in the server encoding, because that's
>>> how we need to send it to the client.
>
>> That doesn't mean it can't be recoded for writing to the log file,
>> though. Perhaps it needs to be. It should be reasonably practical to
>> detect when the database and log encoding are the same and avoid the
>> transcoding performance penalty, not that it's big anyway.
>
> We have seen ... and rejected ... such proposals before. The problem is
> that "transcode to some other encoding" is not a simple and guaranteed
> error-free operation. As an example, if you choose to name some table
> using a character that doesn't exist in the log encoding, you have just
> ensured that no message about that table will ever get to the log.

Well, an arguably reasonable if still suboptimal approach is to mask out
characters without any representation in the target encoding, replacing
them with a substitute ("?" or whatever). The rest of the log message is
still emitted that way.

Currently, Pg may as well be emitting "!(at)#!#!#!@#$!(at)#$" for these log
records. It's garbage unless the user's editor/log viewer/whatever
happens to use the encoding of that set of messages, turning all the
others into garbage instead. To interpret them, I had to

It's not a big deal with languages that mostly use the 7-bit ascii space
most encodings share, but for russian, chinese, japanese, thai, the
various indian languages, etc etc etc it's pretty awful, as seen in
Mikio's example log files.

> Nice way to hide your activities from the DBA ;-)

Emitting messages in the wrong encoding doesn't do the DBA any favours
either. Automated log analysis and reporting will have a hard time
dealing with the logs, and the DBA will have to keep on switching
encodings in their editor/viewer to interpret or search the logs.
Assuming they know how, and know they need to.

> Transcoding also
> eats memory, which might be in exceedingly short supply while trying
> to report an "out of memory" error; and IIRC there are some other
> failure scenarios to be concerned about.

Yep, that's certainly a problem. Pre-transcoding them on backend start
isn't particularly desirable (wasted startup time, memory) and neither
is pre-allocating extra memory for use on fatal exit paths.

OTOH, don't the current message translations also cost at least some
memory, too?

I don't have a good answer for this issue. Only rather less-than-good
ideas like: mmap() a file the postmaster generates that contains various
fatal messages, already in the right encodings/translations, with an
offset table at the front? Icky, but effective and doesn't waste
precious shared memory or produce new unsharable allocations in the
backends that'll only ever get used when something breaks.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Dean Rasheed 2010-09-25 06:58:34 Re: Mapping Hibernate boolean to smallint(Postgresql)
Previous Message Craig Ringer 2010-09-25 03:33:03 Re: BUG #5661: The character encoding in logfile is confusing.

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2010-09-25 07:01:02 Re: What happened to the is_<type> family of functions proposal?
Previous Message Darren Duncan 2010-09-25 03:51:40 Re: What happened to the is_<type> family of functions proposal?