Skip site navigation (1) Skip section navigation (2)

Re: syslog performance when logging big statements

From: david(at)lang(dot)hm
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff <threshar(at)threshar(dot)is-a-geek(dot)com>, Achilleas Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: syslog performance when logging big statements
Date: 2008-07-09 00:47:34
Message-ID: alpine.DEB.1.10.0807081742280.5919@asgard (view raw or flat)
Thread:
Lists: pgsql-performance
On Tue, 8 Jul 2008, Tom Lane wrote:

> Jeff <threshar(at)threshar(dot)is-a-geek(dot)com> writes:
>> On Jul 8, 2008, at 8:24 AM, Achilleas Mantzios wrote:
>>> File sizes of about 3M result in actual logging output of ~ 10Mb.
>>> In this case, the INSERT *needs* 20 minutes to return. This is
>>> because the logging through syslog seems to severely slow the system.
>>> If instead, i use stderr, even with logging_collector=on, the same
>>> statement needs 15 seconds to return.
>
>> In syslog.conf is the destination for PG marked with a "-"? (ie -/var/
>> log/pg.log) which tells syslog to not sync after each line logged.
>> That could explain a large chunk of the difference in time.
>
> I experimented with this a bit here.  There definitely is an O(N^2)
> penalty from the repeated strchr() calls, but it doesn't really start
> to hurt till 1MB or so statement length.  Even with that patched,
> syslog logging pretty much sucks performance-wise.  Here are the numbers
> I got on a Fedora 8 workstation, testing the time to log a statement of
> the form SELECT length('123456...lots of data, no newlines...7890');
>
> statement length			1MB		10MB
>
> CVS HEAD				2523ms		215588ms
> + patch to fix repeated strchr		 529ms		 36734ms
> after turning off syslogd's fsync	 569ms		  5881ms
> PG_SYSLOG_LIMIT 1024, fsync on		 216ms		  2532ms
> PG_SYSLOG_LIMIT 1024, no fsync		 242ms		  2692ms
> For comparison purposes:
> logging statements to stderr		 155ms		  2042ms
> execute statement without logging	  42ms		   520ms
>
> This machine is running a cheap IDE drive that caches writes, so
> the lack of difference between fsync off and fsync on is not too
> surprising --- on a machine with server-grade drives there'd be
> a lot more difference.  (The fact that there is a difference in
> the 10MB case probably reflects filling the drive's write cache.)
>
> On my old HPUX machine, where fsync really works (and the syslogd
> doesn't seem to allow turning it off), the 1MB case takes
> 195957ms with the strchr patch, 22922ms at PG_SYSLOG_LIMIT=1024.
>
> So there's a fairly clear case to be made for fixing the repeated
> strchr, but I also think that there's a case for jacking up
> PG_SYSLOG_LIMIT.  As far as I can tell the current value of 128
> was chosen *very* conservatively without thought for performance:
> http://archives.postgresql.org/pgsql-hackers/2000-05/msg01242.php
>
> At the time we were looking at evidence that the then-current
> Linux syslogd got tummyache with messages over about 1KB:
> http://archives.postgresql.org/pgsql-hackers/2000-05/msg00880.php
>
> Some experimentation with the machines I have handy now says that
>
> Fedora 8:		truncates messages at 2KB (including syslog's header)
> HPUX 10.20 (ancient):	ditto
> Mac OS X 10.5.3:	drops messages if longer than about 1900 bytes
>
> So it appears to me that setting PG_SYSLOG_LIMIT = 1024 would be
> perfectly safe on current systems (and probably old ones too),
> and would give at least a factor of two speedup for logging long
> strings --- more like a factor of 8 if syslogd is fsync'ing.
>
> Comments?  Anyone know of systems where this is too high?
> Perhaps we should make that change only in HEAD, not in the
> back branches, or crank it up only to 512 in the back branches?

with linux ext2/ext3 filesystems I have seen similar problems when the 
syslog starts getting large. there are several factors here

1. fsync after each write unless you have "-" in syslog.conf (only 
available on linux AFAIK)

2. ext2/ext3 tend to be very inefficiant when doing appends to large 
files. each write requires that the syslog daemon seek to the end of the 
file (becouse something else may have written to the file in the meantime) 
and with the small block sizes and chaining of indirect blocks this can 
start to be painful when logfiles get up in to the MB range.

note that you see this same problem when you start to get lots of 
files in one directory as well. even if you delete a lot of files the 
directory itself is still large and this can cause serious performance 
problems.

other filesystems are much less sensitive to file (and directory) sizes.

my suggestion would be to first make sure you are doing async writes to 
syslog, and then try putting the logfiles on different filesystems to see 
how they differ. personally I use XFS most of the time where I expect lots 
of files or large files.

David Lang

In response to

Responses

pgsql-performance by date

Next:From: Tatsuo IshiiDate: 2008-07-09 01:31:34
Subject: Re: syslog performance when logging big statements
Previous:From: Tom LaneDate: 2008-07-08 21:39:17
Subject: Re: syslog performance when logging big statements

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group