Re: Log levels for checkpoint/bgwriter monitoring

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Log levels for checkpoint/bgwriter monitoring
Date: 2007-03-27 17:40:37
Message-ID: 200703271740.l2RHebN07393@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Magnus Hagander wrote:
> Would not at least some of these numbers be better presented through the
> stats collector, so they can be easily monitored?
>
> That goes along the line of my way way way away from finished attempt
> earlier, perhaps a combination of these two patches?

Yes.

---------------------------------------------------------------------------

>
> //Magnus
>
>
> Bruce Momjian wrote:
> > Is this patch ready?
> >
> > ---------------------------------------------------------------------------
> >
> > Greg Smith wrote:
> >> I have a WIP patch that adds the main detail I have found I need to
> >> properly tune checkpoint and background writer activity. I think it's
> >> almost ready to submit (you can see the current patch against 8.2 at
> >> http://www.westnet.com/~gsmith/content/postgresql/patch-checkpoint.txt )
> >> after making it a bit more human-readable. But I've realized that along
> >> with that, I need some guidance in regards to what log level is
> >> appropriate for this information.
> >>
> >> An example works better than explaining what the patch does:
> >>
> >> 2007-02-19 21:53:24.602 EST - DEBUG: checkpoint required (wrote
> >> checkpoint_segments)
> >> 2007-02-19 21:53:24.685 EST - DEBUG: checkpoint starting
> >> 2007-02-19 21:53:24.705 EST - DEBUG: checkpoint flushing buffer pool
> >> 2007-02-19 21:53:24.985 EST - DEBUG: checkpoint database fsync starting
> >> 2007-02-19 21:53:42.725 EST - DEBUG: checkpoint database fsync complete
> >> 2007-02-19 21:53:42.726 EST - DEBUG: checkpoint buffer flush dirty=8034
> >> write=279956 us sync=17739974 us
> >>
> >> Remember that "Load distributed checkpoint" discussion back in December?
> >> You can see exactly how bad the problem is on your system with this log
> >> style (this is from a pgbench run where it's postively awful--it really
> >> does take over 17 seconds for the fsync to execute, and there are clients
> >> that are hung the whole time waiting for it).
> >>
> >> I also instrumented the background writer. You get messages like this:
> >>
> >> 2007-02-19 21:58:54.328 EST - DEBUG: BGWriter Scan All - Written = 5/5
> >> Unscanned = 23/54
> >>
> >> This shows that we wrote (5) the maximum pages we were allowed to write
> >> (5) while failing to scan almost half (23) of the buffers we meant to look
> >> at (54). By taking a look at this output while the system is under load,
> >> I found I was able to do bgwriter optimization that used to take me days
> >> of frustrating testing in hours. I've been waiting for a good guide to
> >> bgwriter tuning since 8.1 came out. Once you have this, combined with
> >> knowing how many buffers were dirty at checkpoint time because the
> >> bgwriter didn't get to them in time (the number you want to minimize), the
> >> tuning guide practically writes itself.
> >>
> >> So my question is...what log level should all this go at? Right now, I
> >> have the background writer stuff adjusting its level dynamically based on
> >> what happened; it logs at DEBUG2 if it hits the write limit (which should
> >> be a rare event once you're tuned properly), DEBUG3 for writes that
> >> scanned everything they were supposed to, and DEBUG4 if it scanned but
> >> didn't find anything to write. The source of checkpoint information logs
> >> at DEBUG1, the fsync/write info at DEBUG2.
> >>
> >> I'd like to move some of these up. On my system, I even have many of the
> >> checkpoint messages logged at INFO (the source of the checkpoint and the
> >> total write time line). It's a bit chatty, but when you get some weird
> >> system pause issue it makes it easy to figure out if checkpoints were to
> >> blame. Given how useful I feel some of these messages are to system
> >> tuning, and to explaining what currently appears as inexplicable pauses, I
> >> don't want them to be buried at DEBUG levels where people are unlikely to
> >> ever see them (I think some people may be concerned about turning on
> >> things labeled DEBUG at all). I am aware that I am too deep into this to
> >> have an unbiased opinion at this point though, which is why I ask for
> >> feedback on how to proceed here.
> >>
> >> --
> >> * Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD
> >>
> >> ---------------------------(end of broadcast)---------------------------
> >> TIP 7: You can help support the PostgreSQL project by donating at
> >>
> >> http://www.postgresql.org/about/donate
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Sailesh Krishnamurthy 2007-03-27 17:56:30 Re: Concurrent connections in psql
Previous Message Magnus Hagander 2007-03-27 17:30:12 Re: Log levels for checkpoint/bgwriter monitoring