Re: Instrument checkpoint sync calls

From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Instrument checkpoint sync calls
Date: 2010-11-30 18:31:29
Message-ID: AANLkTinz+D9=+aKf=4iBY3QiE++_rTQTKSpseHJCYOtA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2010/11/30 Greg Smith <greg(at)2ndquadrant(dot)com>:
> Jeff Janes wrote:
>>
>> For the individual file sync times emitted under debug1, it would be
>> very handy if the file being synced was identified, for example
>> "relation base/16384/16523". Rather than being numbered sequentially
>> within a given checkpoint.
>>
>
> I was numbering them sequentially so that it's straightforward to graph the
> sync times in an external analysis tool, but the relation data is helpful
> too.  New patch reflecting all upthread suggestions is attached.  The output
> looks like this now at DEBUG1:
>
> LOG:  checkpoint starting: xlog
> DEBUG:  checkpoint sync: number=1 file=base/16424/11645 time=11589.549000
> msec
> DEBUG:  checkpoint sync: number=2 file=base/16424/16438 time=16.148000 msec
> DEBUG:  checkpoint sync: number=3 file=base/16424/16437 time=53.530000 msec
> DEBUG:  checkpoint sync: number=4 file=base/16424/16447 time=10.214000 msec
> DEBUG:  checkpoint sync: number=5 file=base/16424/11607 time=1.499000 msec
> DEBUG:  checkpoint sync: number=6 file=base/16424/16425_fsm time=2.921000
> msec
> DEBUG:  checkpoint sync: number=7 file=base/16424/16437.1 time=4.237000 msec
> DEBUG:  checkpoint sync: number=8 file=base/16424/16428_fsm time=1.654000
> msec
> DEBUG:  checkpoint sync: number=9 file=base/16424/16442 time=7.920000 msec
> DEBUG:  checkpoint sync: number=10 file=base/16424/16428_vm time=2.613000
> msec
> DEBUG:  checkpoint sync: number=11 file=base/16424/11618 time=1.468000 msec
> DEBUG:  checkpoint sync: number=12 file=base/16424/16437_fsm time=2.638000
> msec
> DEBUG:  checkpoint sync: number=13 file=base/16424/16428 time=2.883000 msec
> DEBUG:  checkpoint sync: number=14 file=base/16424/16425 time=3.369000 msec
> DEBUG:  checkpoint sync: number=15 file=base/16424/16437_vm time=8.686000
> msec
> DEBUG:  checkpoint sync: number=16 file=base/16424/16425_vm time=5.984000
> msec
> LOG:  checkpoint complete: wrote 2074 buffers (50.6%); 0 transaction log
> file(s) added, 0 removed, 3 recycled; write=0.617 s, sync=11.715 s,
> total=22.167 s; sync files=16, longest=11.589 s, average=0.724 s
>
> I kept the units for the DEBUG level ones in msec because that's a better
> scale for the common really short syncs during good behavior.  But the
> summary info in seconds now appears at the end of the existing "checkpoint
> complete" message, so only one line to parse for those looking to analyze
> the gross checkpoint data.  That looks to work well enough for finding
> situations like the big ext3 spikes.  You can easily see one in this example
> by the fact that "longest=11.589 s" is almost the entirety of "sync=11.715
> s".  That's the really key thing there's currently no visibility into,
> that's made obvious with this patch.

wonderfull.

>
> This might be ready for some proper review now.  I know there's at least one
> blatant bug still in here I haven't found yet, related to how the averages
> are computed.  I saw this once:
>
> LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s)
> added, 0 removed, 1 recycled; write=0.000 s, sync=0.000 s, total=0.001 s;
> sync files=0, longest=0.000 s, average=-9223372036854775808.-2147483 s
>
> After an immediate checkpoint, so at least one path not quite right yet.
>
> --
> Greg Smith   2ndQuadrant US    greg(at)2ndQuadrant(dot)com   Baltimore, MD
>
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-11-30 18:48:39 Re: [GENERAL] column-level update privs + lock table
Previous Message Robert Haas 2010-11-30 18:22:31 Re: [GENERAL] column-level update privs + lock table