Re: PATCH: regular logging of checkpoint progress

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PATCH: regular logging of checkpoint progress
Date: 2011-08-26 07:35:05
Message-ID: 4E574CA9.8090809@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08/25/2011 04:57 PM, Tomas Vondra wrote:
> (b) sends bgwriter stats (so that the buffers_checkpoint is updated)
>

The idea behind only updating the stats in one chunk, at the end, is
that it makes one specific thing easier to do. Let's say you're running
a monitoring system that is grabbing snapshots of pg_stat_bgwriter
periodically. If you want to figure out how much work a checkpoint did,
you only need two points of data to compute that right now. Whenever
you see either of the checkpoint count numbers increase, you just
subtract off the previous sample; now you've got a delta for how many
buffers that checkpoint wrote out. You can derive the information about
the buffer counts involved that appears in the logs quite easily this
way. The intent was to make that possible to do, so that people can
figure this out without needing to parse the log data.

Spreading out the updates defeats that idea. It also makes it possible
to see the buffer writes more in real-time, as they happen. You can
make a case for both approaches having their use cases; the above is
just summarizing the logic behind why it's done the way it is right
now. I don't think many people are actually doing things with this to
the level where their tool will care. The most popular consumer of
pg_stat_bgwriter data I see is Munin graphing changes, and I don't think
it will care either way.

Giving people the option of doing it the other way is a reasonable idea,
but I'm not sure there's enough use case there to justify adding a GUC
just for that. My next goal here is to eliminate checkpoint_segments,
not to add yet another tunable extremely few users would ever touch.

As for throwing more log data out, I'm not sure what new analysis you're
thinking of that it allows. I/O gets increasingly spiky as you zoom in
on it; averaging over a shorter period can easily end up providing less
insight about trends. If anything, I spend more time summarizing the
data that's already there, rather than wanting to break them down. It's
already providing way too much detail for most people. Customers tell
me they don't care to see checkpoint stats unless they're across a day
or more of sampling, so even the current "once every ~5 minutes" is way
more info than they want. I have all this log parsing code and things
that look at pg_stat_bgwriter to collect that data and produce higher
level reports. And lots of it would break if any of this patch is added
and people turn it on. I imagine other log/stat parsing programs might
suffer issues too. That's your other hurdle for change here: the new
analysis techniques have to be useful enough to justify that some
downstream tool disruption is inevitable.

If you have an idea for how to use this extra data for something useful,
let's talk about what that is and see if it's possible to build it in
instead. This problem is harder than it looks, mainly because the way
the OS caches writes here makes trying to derive hard numbers from what
the background writer is doing impossible. When the database writes
things out, and when they actually get written to disk, they are not the
same event. The actual write is often during the sync phase, and not
being able to tracking that beast is where I see the most problems at.
The write phase, the easier part to instrument in the database, that is
pretty boring. That's why the last extra logging I added here focused
on adding visibility to the sync activity instead.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2011-08-26 07:54:19 Re: PATCH: regular logging of checkpoint progress
Previous Message Christian Ullrich 2011-08-26 06:47:43 Re: Removal of useless include references