Re: PATCH: regular logging of checkpoint progress

From: "Tomas Vondra" <tv(at)fuzzy(dot)cz>
To: "Greg Smith" <greg(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PATCH: regular logging of checkpoint progress
Date: 2011-08-26 08:46:33
Message-ID: 655fb4e00553434672d2a7be4bdcf78d.squirrel@sq.gransy.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 26 Srpen 2011, 9:35, Greg Smith wrote:
> On 08/25/2011 04:57 PM, Tomas Vondra wrote:
>> (b) sends bgwriter stats (so that the buffers_checkpoint is updated)
>>
> As for throwing more log data out, I'm not sure what new analysis you're
> thinking of that it allows. I/O gets increasingly spiky as you zoom in
> on it; averaging over a shorter period can easily end up providing less
> insight about trends. If anything, I spend more time summarizing the
> data that's already there, rather than wanting to break them down. It's
> already providing way too much detail for most people. Customers tell
> me they don't care to see checkpoint stats unless they're across a day
> or more of sampling, so even the current "once every ~5 minutes" is way
> more info than they want. I have all this log parsing code and things
> that look at pg_stat_bgwriter to collect that data and produce higher
> level reports. And lots of it would break if any of this patch is added
> and people turn it on. I imagine other log/stat parsing programs might
> suffer issues too. That's your other hurdle for change here: the new
> analysis techniques have to be useful enough to justify that some
> downstream tool disruption is inevitable.

I was aware that by continuously updating pg_stat_bgwriter, the data won't
be synchronized (i.e. the buffers_checkpoint counters will change while
the number of requested/timed checkpoints remain the same).

But does that really break the tools that process the data? When you're
working with summarized data, the result should be more or less the same
as the difference will be smoothed out by averaging etc. You can always
see just one "in progress" checkpoint, so if you get 24 checkpoints a day,
the difference will be 1/24 of a checkpoint. Yes, it's a difference.

A really crazy workaround would be to change checkpoints_requested /
checkpoints_timed to double, and use that to indicate current progress of
the checkpoint. So for example 10.54 would mean 10 checkpoints completed,
one checkpoint in progress, already written 54% of blocks. But yes, that's
a bit crazy.

> If you have an idea for how to use this extra data for something useful,
> let's talk about what that is and see if it's possible to build it in
> instead. This problem is harder than it looks, mainly because the way
> the OS caches writes here makes trying to derive hard numbers from what
> the background writer is doing impossible. When the database writes
> things out, and when they actually get written to disk, they are not the
> same event. The actual write is often during the sync phase, and not
> being able to tracking that beast is where I see the most problems at.
> The write phase, the easier part to instrument in the database, that is
> pretty boring. That's why the last extra logging I added here focused
> on adding visibility to the sync activity instead.

Hmmm, let me explain what led me to this patch - right now I'm doing a
comparison of filesystems with various block sizes (both fs and db
blocks). I've realized that the db block size significantly influences
frequency of checkpoints and amount of data to write, so I'm collecting
data from pg_stat_bgwriter too. The benchmark goes like this

1. collect pg_stat_bgwriter stats
2. run pgbench for 10 minutes
3. collect pg_stat_bgwriter stats (to compute difference with (1))
4. kill the postmaster

The problem is that when checkpoint stats are collected, there might be a
checkpoint in progress and in that case the stats are incomplete. In some
cases (especially with very small db blocks) this has significant impact
because the checkpoints are less frequent.

I can't infer this from other data (e.g. iostat), because that does allow
me what I/O is caused by the checkpoint.

Yes, this does not consider sync timing, but in my case that's not a big
issue (the page cache is rather small so the data are actually forced to
the drive soon).

I could probably live with keeping the current pg_stat_bgwriter logic
(i.e. updating just once) and writing checkpoint status just to the log. I
don't think that should break any current tools that parse logs, because
the message is completely different (prefixed with 'checkpoint status') so
any reasonably written tool should be OK.

Tomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kohei KaiGai 2011-08-26 09:32:59 Re: [v9.1] sepgsql - userspace access vector cache
Previous Message Magnus Hagander 2011-08-26 07:54:19 Re: PATCH: regular logging of checkpoint progress