Skip site navigation (1) Skip section navigation (2)

Logging checkpoints and other slowdown causes

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-patches(at)postgresql(dot)org
Subject: Logging checkpoints and other slowdown causes
Date: 2007-04-01 07:27:46
Message-ID: Pine.GSO.4.64.0704010051050.14920@westnet.com (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches
This patch puts entries in the log file around some backend events that 
regularly cause client pauses, most of which are visible only when running 
with log_min_messages = debug2.  The main focus is regularly occuring 
checkpoints, from either running out of segments or hitting a timeout. 
This is an example of what gets written to the logs at checkpoint time:

DEBUG2:  checkpoint required (timeout passed)
DEBUG2:  checkpoint starting
DEBUG2:  checkpoint flushing buffer pool
DEBUG2:  checkpoint database fsync starting
DEBUG2:  checkpoint database fsync complete
DEBUG1:  checkpoint buffers dirty=16.7 MB (13.0%) write=174.8 ms 
sync=3716.5 ms
DEBUG2:  checkpoint complete; 0 transaction log file(s) added, 0 removed, 
0 recycled

The idea here is that if you want to really watch what your system is 
doing, either to track down a problem or for benchmarking, you log at 
DEBUG2 and timestamp the logs with something like log_line_prefix = 
'[%m]:'.  Then line up your timestamped transactions with the logs to 
figure out what happened during bad periods.  This is very easy to do with 
the timestamped pgbench latency logs for example, so you can put them in 
time sequence order and see something like "oh, those long transactions 
all came in between the fsync start and end".  The summary line with the 
write/sync timing information has a broader use and suggests when you need 
to go deeper, which is why I put that one at DEBUG1.

I also adjust some other log messages to make it easier to run the system 
at DEBUG2 without going completely crazy, like moving individual WAL 
segment recycles to DEBUG3 and standardizing the format of the fork/exit 
messages so they're easier to filter out.

My main concern about this code is how it adds several gettimeofday calls 
and complicated string building inside FlushBufferPool, all of which is 
wasted if this data isn't actually going to get logged.  I'd like to have 
something that tells me whether I should even bother, but I didn't want to 
poke into the elog variables for fear of adding a dependency on its 
current structure to here.  Ideally, it would be nice to call something 
like WillLog(DEBUG2) and get a boolean back saying whether something at 
that log level will be output anywhere; I don't know enough about the 
logging code to add such a thing to it myself.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

Attachment: checkpointlog.txt
Description: text/plain (8.1 KB)

Responses

pgsql-hackers by date

Next:From: Peter EisentrautDate: 2007-04-01 07:40:09
Subject: Re: Macros for typtype (was Re: Arrays of Complex Types)
Previous:From: Tom LaneDate: 2007-04-01 06:52:49
Subject: Re: Last minute mini-proposal (I know, I know)forPQexecf()

pgsql-patches by date

Next:From: Peter EisentrautDate: 2007-04-01 07:40:09
Subject: Re: Macros for typtype (was Re: Arrays of Complex Types)
Previous:From: Tom LaneDate: 2007-04-01 03:22:52
Subject: Re: Current enums patch

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group