Re: PATCH: regular logging of checkpoint progress

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Noah Misch <noah(at)2ndQuadrant(dot)com>
Cc: Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: PATCH: regular logging of checkpoint progress
Date: 2011-08-27 07:39:14
Message-ID: 4E589F22.1020408@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08/27/2011 12:01 AM, Noah Misch wrote:
> On Fri, Aug 26, 2011 at 10:46:33AM +0200, Tomas Vondra wrote:
>
>> 1. collect pg_stat_bgwriter stats
>> 2. run pgbench for 10 minutes
>> 3. collect pg_stat_bgwriter stats (to compute difference with (1))
>> 4. kill the postmaster
>>
>> The problem is that when checkpoint stats are collected, there might be a
>> checkpoint in progress and in that case the stats are incomplete. In some
>> cases (especially with very small db blocks) this has significant impact
>> because the checkpoints are less frequent.
>>
> Could you remove this hazard by adding a step "2a. psql -c CHECKPOINT"?
>

That's what I do in pgbench-tools, and it helps a lot. It makes it
easier to identify when the checkpoint kicks in if you know it's
approximately the same time after each test run begins, given similar
testing parameters. That said, it's hard to eliminate all of the edge
conditions here.

For example, imagine that you're consuming WAL files such that you hit
checkpoint_segments every 4 minutes. In a 10 minute test run, a
checkpoint will start at 4:00 and finish at around 6:00 (with
checkpoint_completion_target=0.5). The next will start at 8:00 and
should finish at around 10:00--right at the end of when the test ends.
Given the variation that sync timing and rounding issues in the write
phase adds to things, you can expect that some test runs will include
stats from 2 checkpoints, while others will end the test just before the
second one finishes. It does throw the numbers off a bit.

To avoid this when it pops up, I normally aim to push up to where there
are >=4 checkpoints per test run, just so whether I get n or n-1 of them
doesn't impact results as much. But that normally takes doubling the
length of the test to 20 minutes. As it will often take me days of test
time to plow through exploring just a couple of parameters, I'm
sympathetic to Tomas trying to improve accuracy here without having to
run for quite so long. There's few people who have this problem to
worry about though, it's a common issue with benchmarking but not many
other contexts.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jesper Krogh 2011-08-27 07:52:05 Re: tsvector concatenation - backend crash
Previous Message Gokulakannan Somasundaram 2011-08-27 05:38:45 Re: cheaper snapshots redux