Re: Industrial-Strength Logging

From: Giles Lean <giles(at)nemeton(dot)com(dot)au>
To:
Subject: Re: Industrial-Strength Logging
Date: 2000-06-03 12:59:34
Message-ID: 16919.960037174@nemeton.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers pgsql-patches

------- Blind-Carbon-Copy

To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Industrial-Strength Logging
In-reply-to: <Pine(dot)LNX(dot)4(dot)21(dot)0006021547090(dot)406-100000(at)localhost(dot)localdomain>
Date: Sat, 03 Jun 2000 22:59:34 +1000
Message-ID: <16919(dot)960037174(at)nemeton(dot)com(dot)au>
From: Giles Lean <giles(at)nemeton(dot)com(dot)au>

On Sat, 3 Jun 2000 01:48:33 +0200 (CEST) Peter Eisentraut wrote:

> Yeah, let's have another logging discussion... :)

Mmm, seems popular. There was a mention on -ports and -general a
couple of weeks ago, and here we are (were) on -patches. I'm moving
this discussion to -hackers (hope that's a good choice) since that is
where Tim Holloway's proposals were discussed late last year.

A start point I found in the archives for Tim's proposal is:
http://www.postgresql.org/mhonarc/pgsql-hackers/1999-10/msg00747.html

I'm not proposing anything that advanced. In particular, I'm not
discussing the -content- of log messages at all. For now it would be
nice to see the logging mechanism improved; changing or improving the
content can be another project.

I don't discuss the current logging implementation except to note that
the backend postgres processes' logging depends on whether the process
is running under postmaster or not, has a controlling terminal or not,
whether a -o option was provided, and whether postgres was compiled to
use syslog. Maybe that functionality can be simplified a bit ... ;-)

One more thing I don't discuss is how the debug log level is set.
Certainly something more sophisticated and dynamically variable than
the current command line method would be nice, but that too can be a
discussion for another day; it isn't much related to -how- the error
messages are tucked away.

Typical logging methods
=======================

(a)(i) write to standard error with redirection to a file

Pro:
- what the code (mostly) does currently
- very easy to set up, just redirect standard error at startup
- efficient and low overhead

Con:
- can't rotate log files
- problematic when there is an I/O error or the filesystem the log
file is on fills up

(a)(ii) write to standard error, with standard error piped to another
process

Pro:
- administrator chooses between (i) and (ii) and can change this
via shutdown and restart, no recompilation needed
- no code changes to backend programs
- clean separation of functionality
- choice of backend logging programs
o Bernstein's logtools
o Apache's rotatelogs
o swatch
o logsurfer
o ...

Con:
- backend can block if the logging process is not reading log
messages fast enough (can the backends generate enough data for
this to be a problem in practice?)
- reliability of message logging is dependent on the log
process
- log messages can be lost if the log process aborts, or is not
started (solution: make portmaster responsible for starting and
restartin the log process)

(b) write to named log file(s)

One way to allow rotation of log files is for the backend
processes to know what log files they write to, and to have them
open them directly without shell redirection. There is some
support for this with the postgres -o option, but no support
for rotating these files that I have seen so far.

In the simplest case, the backend processes open the log file at
when they start and close it when they exit. This allows rotation
of the log file by moving it and waiting for all the currently
running backend processes to finish.

Pro:
- relatively simple code change
- still efficient and low overhead

Con:
- backend processes can run for some time, and postmaster runs
indefinitely, so at least postmaster needs to know about log
file rotation
- doesn't help much for I/O errors or full filesystem

To address these limitations some applications open their log file
for each message and then close it afterward:

Pro:
- nothing holds the log file open for long
- still efficient and low overhead for the actual writing the log
file

Con:
- all error logging has to be via a log routine. This would be
elog(), but there is some use of fprintf(stderr, ...) around the
place that would want to be changed

- there will be some efficiency hit for the open() and close()
calls. This won't be -too- bad since the operating system's
inode cache (or local equivalent) should contain an entry for
the log file, but it is still two more system calls.

Another way to handle rotation with long running processes is to
signal them to re-open their log file, like syslogd is managed:

Pro:
- it's a solution

Con:
- more code in the backend processes
- more communication with the backend processes
- more complication

(c) log via some logging facility such as syslogd

Pro:
- people know syslogd
- syslogd "looks like" the right answer

Con:
- messages are sent to it typically via UDP, so it's easy for them
to get lost
- syslogd is not coded robustly, and can hang, stop listening to
input sources, stop writing to output files, and put wrong
timestamps on messages
- using non-UDP transport to syslogd is non-portable (some systems
allow Unix domain sockets, some named pipes, some neither)
- syslogd can be hard to secure (e.g. to stop it listening on
particular network interfaces)
- do you know what your syslog(3) routine does when it gets a
write(2) error?
- it is not supported by all (any?) vendors to have syslogd write
to a pipe, so on-the-fly processing of error messages is not
possible; they have to be written to the filesystem
- Unix specific(?)

(d) log into the database

Pro:
- we've got a nice tool, let's use it

Con:
- chicken-and-egg problem of logging serious/fatal errors
- infinite recursion problem of logging messages that cause more errors
- error messages are predominantly text messages, very susceptible
to processing with perl, grep and friends. It isn't crystal
clear that having them inside the database helps a lot
- using the database is friendly to a DBA, but quite possibly not
to the system adminstrator who very possibly knows no SQL but
who (on a Unix system, at least) has many text log files to look
after
- postmaster doesn't access tables according to the previous
discussion

Recommendations
===============

From here on it's definitely all IMHO! Your mileage may vary.

To restate the methods I discussed:

(a)(i) standard error to file
(ii) standard error piped to a process
(b) named log file(s)
(c) syslogd
(d) database

I would recommend (a)(ii), with (a)(i) available for anyone who wants
it. (Someone who has high load 9-5 but who can shut down daily might
be happy writing directly to a log file, for example.)

Using the database is too complex/fragile, using syslogd fails too
often, and having every process involved manage log files as in (b)
seems like more work than is necessary unless/until logging via a
single process turns out to be a bottleneck.

There has been a suggestion to use the Apache "rotatelogs" program to
do the logging. That program is perhaps a little -too- simplistic:

- - no timestamp option
- - no way to make ad-hoc requests for log file rotation
- - write errors cause it to exit (... oops!)
- - no fsync option

On the other hand the program is only 100 lines. Writing a robust
version isn't rocket science, and if postmaster is taught to start it,
then pg_ctl looks like a natural place to control log file rotation
and possibly one day log levels as well.

Regards,

Giles

P.S. Yes, I've some coding time to offer. I'll wait and see what is
liked before I do anything though. ;-)

------- End of Blind-Carbon-Copy

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Louis-David Mitterrand 2000-06-03 15:22:56 child table doesn't inherit PRIMARY KEY?
Previous Message carl garland 2000-06-03 12:18:11 Problems with Large Databases

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2000-06-03 14:00:46 Re: New warning code for missing FROM relations
Previous Message Tatsuo Ishii 2000-06-03 08:38:27 Re: [GENERAL] PostgreSQL v7.0.1 Released

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2000-06-03 16:36:58 Re: Re: Industrial-Strength Logging
Previous Message Peter Eisentraut 2000-06-02 23:48:33 Re: Industrial-Strength Logging