Re: Logging idle checkpoints

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: michael(dot)paquier(at)gmail(dot)com, vik(dot)fearing(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Logging idle checkpoints
Date: 2017-10-03 12:22:27
Message-ID: 20171003122227.GJ4628@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Kyotaro HORIGUCHI (horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp) wrote:
> At Tue, 3 Oct 2017 10:23:08 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqQ3Q1J_wBC7yPXk39dO0RGvbM4-nYp2gMrCJ7pfPJXcYw(at)mail(dot)gmail(dot)com>
> > On Tue, Oct 3, 2017 at 12:01 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > > I certainly don't care for the idea of adding log messages saying we
> > > aren't doing anything just to match a count that's incorrectly claiming
> > > that checkpoints are happening when they aren't.
> > >
> > > The down-thread suggestion of keeping track of skipped checkpoints might
> > > be interesting, but I'm not entirely convinced it really is. We have
> > > time to debate that, of course, but I don't really see how that's
> > > helpful. At the moment, it seems like the suggestion to add that column
> > > is based on the assumption that we're going to start logging skipped
> > > checkpoints and having that column would allow us to match up the count
> > > between the new column and the "skipped checkpoint" messages in the logs
> > > and I can not help but feel that this is a ridiculous amount of effort
> > > being put into the analysis of something that *didn't* happen.
> >
> > Being able to look at how many checkpoints are skipped can be used as
> > a tuning indicator of max_wal_size and checkpoint_timeout, or in short
> > increase them if those remain idle.
>
> We ususally adjust the GUCs based on how often checkpoint is
> *executed* and how many of the executed checkpoints have been
> triggered by xlog progress (or with shorter interval than
> timeout). It seems enough. Counting skipped checkpoints gives
> just a rough estimate of how long the system was getting no
> substantial updates. I doubt that users get something valuable by
> counting skipped checkpoints.

Yeah, I tend to agree. I don't really see how counting skipped
checkpoints helps to size max_wal_size or even checkpoint_timeout. The
whole point here is that nothing is happening and if nothing is
happening then there's no real need to adjust max_wal_size or
checkpoint_timeout or, well, much of anything really..

> > Since their introduction in
> > 335feca4, m_timed_checkpoints and m_requested_checkpoints track the
> > number of checkpoint requests, not if a checkpoint has been actually
> > executed or not, I am not sure that this should be changed after 10
> > years. So, to put it in other words, wouldn't we want a way to track
> > checkpoints that are *executed*, meaning that we could increment a
> > counter after doing the skip checks in CreateRestartPoint() and
> > CreateCheckPoint()?
>
> This sounds reasonable to me.

I agree that tracking executed checkpoints is valuable, but, and perhaps
I'm missing something, isn't that the same as tracking non-skipped
checkpoints? I suppose we could have both, if we really feel the need,
provided that doesn't result in more work or effort being done than
simply keeping the count. I'd hate to end up in a situation where we're
writing things out unnecessairly just to keep track of checkpoints that
were requested but ultimately skipped because there wasn't anything to
do.

Thanks!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Euler Taveira 2017-10-03 12:47:32 Re: Postgresql gives error that role goes not exists while it exists
Previous Message Amit Khandekar 2017-10-03 12:16:52 Re: UPDATE of partition key