Re: PATCH: regular logging of checkpoint progress

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PATCH: regular logging of checkpoint progress
Date: 2011-09-03 06:19:14
Message-ID: 4E61C6E2.7050503@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/02/2011 11:10 AM, Tomas Vondra wrote:
> My 'ideal' solution would be either to add another GUC (to turn this
> on/off) or allow log_checkpoints to have three values
>
> log_checkpoints = {off, normal, detailed}
>
> where 'normal' provides the current output and 'detail' produces this much
> verbose output.
>

If this is going to be acceptable, that's likely the only path it could
happen by and still meet what you're looking for. I will just again
stress that the part you're working on instrumenting better right now is
not actually where larger systems really run into the most problems
here, based on what I've seen. I added a series of log messages to 9.1
at DEBUG1, aimed at tracking the sync phase. That's where I see many
more checkpoint issues than in the write one. On Linux in particular,
it's almost impossible for the write phase to be more of a problem than
the sync one.

So the logging you're adding here I don't ever expect to turn on. But I
wouldn't argue against an option to handle the logging use-case you're
concerned about. Letting people observe for themselves and decide which
of the phases is more interesting to their workload seems appropriate.
Then users have options for what to log, no matter which type of problem
they run into.

If you're expanding log_checkpoints to an enum, for that to handle what
I think everybody might ever want (for what checkpoints do now at
least), I'd find that more useful if it happened like this instead:

log_checkpoints = {off, on, write, sync, verbose}

I don't think you should change the semantics of off/on, which will
avoid breaking existing postgresql.conf files and resources that suggest
tuning advice. "write" can toggle on what you're adding; "sync" should
control whether the DEBUG1 messages showing the individual file names in
the sync phase appear; and "verbose" can include both.

As far as a heuristic for making this less chatty when there's nothing
exciting happening goes, I think something based on how much time has
passed would be the best one. In your use case, I would guess you don't
really care whether a message appears every n%. If I understand you
correctly now, you would mainly care about getting enough log detail to
know 1) when things are running really slow, or b) often enough that the
margin of error in your benchmark results from unaccounted checkpoint
writes is acceptable. In both of those cases, I'd think a time-based
threshold would be appropriate, and that also deals with the time-based
checkpoints, too.

If your logging criteria for the write phase was "display a message any
time more than 30 seconds have passed since last seeing one", that would
give you only a few lines of output in a boring, normal
checkpoint--certainly less than the 9 in-progress samples you're
outputting now, at 10% intervals. But in the pathological situations
where writes are super slow, your log data would become correspondingly
denser, which is exactly what you want in that situation.

I think combining the two makes the most sense: "log when >=30 seconds
have passed since the last message, and there's been >=10% more progress
made". (Maybe do the progress check before the time one, to cut down on
gettimeofday() calls) That would give you 4 in-progress reports during
a standard 2.5 minute write phase, and in cases where the checkpoints
are taking a long time you'd get as many as 9. That's pretty close to
auto-tuning the amount of log output, so the amount of it is roughly
proportional to how likely the information it's logging will be
interesting.

We certainly don't want to add yet another GUC just to control the
frequency. I don't think it will be too hard to put two hard-coded
thresholds in and do good enough for just about everyone though. I
would probably prefer setting those thresholds to 60 seconds/20%
instead. That might not be detailed enough for you though.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-09-03 14:23:09 Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge
Previous Message Jeremy Drake 2011-09-03 05:54:33 Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge