Skip site navigation (1) Skip section navigation (2)

Re: Instrument checkpoint sync calls

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Instrument checkpoint sync calls
Date: 2010-11-30 16:38:13
Message-ID: 4CF52875.9040806@2ndquadrant.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Jeff Janes wrote:
> For the individual file sync times emitted under debug1, it would be
> very handy if the file being synced was identified, for example
> "relation base/16384/16523". Rather than being numbered sequentially
> within a given checkpoint.
>   

I was numbering them sequentially so that it's straightforward to graph 
the sync times in an external analysis tool, but the relation data is 
helpful too.  New patch reflecting all upthread suggestions is 
attached.  The output looks like this now at DEBUG1:

LOG:  checkpoint starting: xlog
DEBUG:  checkpoint sync: number=1 file=base/16424/11645 
time=11589.549000 msec
DEBUG:  checkpoint sync: number=2 file=base/16424/16438 time=16.148000 msec
DEBUG:  checkpoint sync: number=3 file=base/16424/16437 time=53.530000 msec
DEBUG:  checkpoint sync: number=4 file=base/16424/16447 time=10.214000 msec
DEBUG:  checkpoint sync: number=5 file=base/16424/11607 time=1.499000 msec
DEBUG:  checkpoint sync: number=6 file=base/16424/16425_fsm 
time=2.921000 msec
DEBUG:  checkpoint sync: number=7 file=base/16424/16437.1 time=4.237000 msec
DEBUG:  checkpoint sync: number=8 file=base/16424/16428_fsm 
time=1.654000 msec
DEBUG:  checkpoint sync: number=9 file=base/16424/16442 time=7.920000 msec
DEBUG:  checkpoint sync: number=10 file=base/16424/16428_vm 
time=2.613000 msec
DEBUG:  checkpoint sync: number=11 file=base/16424/11618 time=1.468000 msec
DEBUG:  checkpoint sync: number=12 file=base/16424/16437_fsm 
time=2.638000 msec
DEBUG:  checkpoint sync: number=13 file=base/16424/16428 time=2.883000 msec
DEBUG:  checkpoint sync: number=14 file=base/16424/16425 time=3.369000 msec
DEBUG:  checkpoint sync: number=15 file=base/16424/16437_vm 
time=8.686000 msec
DEBUG:  checkpoint sync: number=16 file=base/16424/16425_vm 
time=5.984000 msec
LOG:  checkpoint complete: wrote 2074 buffers (50.6%); 0 transaction log 
file(s) added, 0 removed, 3 recycled; write=0.617 s, sync=11.715 s, 
total=22.167 s; sync files=16, longest=11.589 s, average=0.724 s

I kept the units for the DEBUG level ones in msec because that's a 
better scale for the common really short syncs during good behavior.  
But the summary info in seconds now appears at the end of the existing 
"checkpoint complete" message, so only one line to parse for those 
looking to analyze the gross checkpoint data.  That looks to work well 
enough for finding situations like the big ext3 spikes.  You can easily 
see one in this example by the fact that "longest=11.589 s" is almost 
the entirety of "sync=11.715 s".  That's the really key thing there's 
currently no visibility into, that's made obvious with this patch.

This might be ready for some proper review now.  I know there's at least 
one blatant bug still in here I haven't found yet, related to how the 
averages are computed.  I saw this once:

LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log 
file(s) added, 0 removed, 1 recycled; write=0.000 s, sync=0.000 s, 
total=0.001 s; sync files=0, longest=0.000 s, 
average=-9223372036854775808.-2147483 s

After an immediate checkpoint, so at least one path not quite right yet.

-- 
Greg Smith   2ndQuadrant US    greg(at)2ndQuadrant(dot)com   Baltimore, MD



Attachment: log-sync-v3.patch
Description: text/x-patch (5.6 KB)

In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2010-11-30 16:40:10
Subject: Re: crash-safe visibility map, take three
Previous:From: Tom LaneDate: 2010-11-30 16:37:13
Subject: Re: crash-safe visibility map, take three

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group