Re: syslog performance when logging big statements

From: "achill(at)matrix" <achill(at)matrix(dot)gatewaynet(dot)com>
To: david(at)lang(dot)hm
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff <threshar(at)threshar(dot)is-a-geek(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: syslog performance when logging big statements
Date: 2008-07-09 12:37:26
Message-ID: 200807091537.27092.achill@matrix.gatewaynet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Στις Wednesday 09 July 2008 03:47:34 ο/η david(at)lang(dot)hm έγραψε:
> On Tue, 8 Jul 2008, Tom Lane wrote:
> > Jeff <threshar(at)threshar(dot)is-a-geek(dot)com> writes:
> >> On Jul 8, 2008, at 8:24 AM, Achilleas Mantzios wrote:
> >>> File sizes of about 3M result in actual logging output of ~ 10Mb.
> >>> In this case, the INSERT *needs* 20 minutes to return. This is
> >>> because the logging through syslog seems to severely slow the system.
> >>> If instead, i use stderr, even with logging_collector=on, the same
> >>> statement needs 15 seconds to return.
> >>
> >> In syslog.conf is the destination for PG marked with a "-"? (ie -/var/
> >> log/pg.log) which tells syslog to not sync after each line logged.
> >> That could explain a large chunk of the difference in time.
> >
> > I experimented with this a bit here. There definitely is an O(N^2)
> > penalty from the repeated strchr() calls, but it doesn't really start
> > to hurt till 1MB or so statement length. Even with that patched,
> > syslog logging pretty much sucks performance-wise. Here are the numbers
> > I got on a Fedora 8 workstation, testing the time to log a statement of
> > the form SELECT length('123456...lots of data, no newlines...7890');
> >
> > statement length 1MB 10MB
> >
> > CVS HEAD 2523ms 215588ms
> > + patch to fix repeated strchr 529ms 36734ms
> > after turning off syslogd's fsync 569ms 5881ms
> > PG_SYSLOG_LIMIT 1024, fsync on 216ms 2532ms
> > PG_SYSLOG_LIMIT 1024, no fsync 242ms 2692ms
> > For comparison purposes:
> > logging statements to stderr 155ms 2042ms
> > execute statement without logging 42ms 520ms
> >
> > This machine is running a cheap IDE drive that caches writes, so
> > the lack of difference between fsync off and fsync on is not too
> > surprising --- on a machine with server-grade drives there'd be
> > a lot more difference. (The fact that there is a difference in
> > the 10MB case probably reflects filling the drive's write cache.)
> >
> > On my old HPUX machine, where fsync really works (and the syslogd
> > doesn't seem to allow turning it off), the 1MB case takes
> > 195957ms with the strchr patch, 22922ms at PG_SYSLOG_LIMIT=1024.
> >
> > So there's a fairly clear case to be made for fixing the repeated
> > strchr, but I also think that there's a case for jacking up
> > PG_SYSLOG_LIMIT. As far as I can tell the current value of 128
> > was chosen *very* conservatively without thought for performance:
> > http://archives.postgresql.org/pgsql-hackers/2000-05/msg01242.php
> >
> > At the time we were looking at evidence that the then-current
> > Linux syslogd got tummyache with messages over about 1KB:
> > http://archives.postgresql.org/pgsql-hackers/2000-05/msg00880.php
> >
> > Some experimentation with the machines I have handy now says that
> >
> > Fedora 8: truncates messages at 2KB (including syslog's header)
> > HPUX 10.20 (ancient): ditto
> > Mac OS X 10.5.3: drops messages if longer than about 1900 bytes
> >
> > So it appears to me that setting PG_SYSLOG_LIMIT = 1024 would be
> > perfectly safe on current systems (and probably old ones too),
> > and would give at least a factor of two speedup for logging long
> > strings --- more like a factor of 8 if syslogd is fsync'ing.
> >
> > Comments? Anyone know of systems where this is too high?
> > Perhaps we should make that change only in HEAD, not in the
> > back branches, or crank it up only to 512 in the back branches?
>
> with linux ext2/ext3 filesystems I have seen similar problems when the
> syslog starts getting large. there are several factors here
>
> 1. fsync after each write unless you have "-" in syslog.conf (only
> available on linux AFAIK)
>
In FreeBSD 7.0 by default it does not fsync (except for kernel messages),
unless the path is prefixed by "-" whereas it syncs.
> 2. ext2/ext3 tend to be very inefficiant when doing appends to large
> files. each write requires that the syslog daemon seek to the end of the
> file (becouse something else may have written to the file in the meantime)
> and with the small block sizes and chaining of indirect blocks this can
> start to be painful when logfiles get up in to the MB range.
>
> note that you see this same problem when you start to get lots of
> files in one directory as well. even if you delete a lot of files the
> directory itself is still large and this can cause serious performance
> problems.
>
> other filesystems are much less sensitive to file (and directory) sizes.
>
> my suggestion would be to first make sure you are doing async writes to
> syslog, and then try putting the logfiles on different filesystems to see
> how they differ. personally I use XFS most of the time where I expect lots
> of files or large files.
>
> David Lang

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message achill@matrix 2008-07-09 12:51:30 Re: syslog performance when logging big statements
Previous Message achill@matrix 2008-07-09 12:31:05 Re: syslog performance when logging big statements