Re: main log encoding problem

From: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To: Alexander Law <exclusion(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, yi(dot)codeplayer(at)gmail(dot)com, Pg Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: main log encoding problem
Date: 2012-07-19 04:22:46
Message-ID: 50078B96.8020206@ringerc.id.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-general pgsql-hackers

On 07/18/2012 11:16 PM, Alexander Law wrote:
> Hello!
>
> May I to propose a solution and to step up?
>
> I've read a discussion of the bug #5800 and here is my 2 cents.
> To make things clear let me give an example.
> I am a PostgreSQL hosting provider and I let my customers to create
> any databases they wish.
> I have clients all over the world (so they can create databases with
> different encoding).
>
> The question is - what I (as admin) want to see in my postgresql log,
> containing errors from all the databases?
> IMHO we should consider two requirements for the log.
> First, The file should be readable with a generic text viewer. Second,
> It should be useful and complete as possible.
>
> Now I see following solutions.
> A. We have different logfiles for each database with different encodings.
> Then all our logs will be readable, but we have to look at them one by
> onе and it's inconvenient at least.
> Moreover, our log reader should understand what encoding to use for
> each file.
>
> B. We have one logfile with the operating system encoding.
> First downside is that the logs can be different for different OSes.
> The second is that Windows has non-Unicode system encoding.
> And such an encoding can't represent all the national characters. So
> at best I will get ??? in the log.
>
> C. We have one logfile with UTF-8.
> Pros: Log messages of all our clients can fit in it. We can use any
> generic editor/viewer to open it.
> Nothing changes for Linux (and other OSes with UTF-8 encoding).
> Cons: All the strings written to log file should go through some
> conversation function.
>
> I think that the last solution is the solution. What is your opinion?

Implementing any of these isn't trivial - especially making sure
messages emitted to stderr from things like segfaults and dynamic linker
messages are always correct. Ensuring that the logging collector knows
when setlocale() has been called to change the encoding and translation
of system messages, handling the different logging output methods, etc -
it's going to be fiddly.

I have some performance concerns about the transcoding required for (b)
or (c), but realistically it's already the norm to convert all the data
sent to and from clients. Conversion for logging should not be a
significant additional burden. Conversion can be short-circuited out
when source and destination encodings are the same for the common case
of logging in utf-8 or to a dedicated file.

I suspect the eventual choice will be "all of the above":

- Default to (b) or (c), both have pros and cons. I favour (c) with a
UTF-8 BOM to warn editors, but (b) is nice for people whose DBs are all
in the system locale.

- Allow (a) for people who have many different DBs in many different
encodings, do high volume logging, and want to avoid conversion
overhead. Let them deal with the mess, just provide an additional % code
for the encoding so they can name their per-DB log files to indicate the
encoding.

The main issue is just that code needs to be prototyped, cleaned up, and
submitted. So far nobody's cared enough to design it, build it, and get
it through patch review. I've just foolishly volunteered myself to work
on an automated crash-test system for virtual plug-pull testing, so I'm
not stepping up.

--
Craig Ringer

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alexander Law 2012-07-19 06:37:49 Re: main log encoding problem
Previous Message Tatsuo Ishii 2012-07-19 01:57:27 Re: main log encoding problem

Browse pgsql-general by date

  From Date Subject
Next Message Amod Pandey 2012-07-19 05:52:36 Re: Segmentation fault
Previous Message Craig Ringer 2012-07-19 03:55:05 Re: data from the table is getting dropped when I am restarting my application after making changes in the objects created in my application in play

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2012-07-19 05:33:39 Re: bgwriter, regression tests, and default shared_buffers settings
Previous Message Tatsuo Ishii 2012-07-19 01:57:27 Re: main log encoding problem