Re: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer

From: Álvaro Herrera <alvherre(at)kurilemu(dot)de>
To: Michael Banck <mbanck(at)gmx(dot)net>
Cc: Soumya S Murali <soumyamurali(dot)work(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, melanieplageman(at)gmail(dot)com
Subject: Re: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer
Date: 2025-11-24 10:07:41
Message-ID: 202511240955.vt3fjrb4ksrs@alvherre.pgsql
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2025-Nov-24, Michael Banck wrote:

> In general I doubt how much those gauges (as oppposed to counters) only
> pertaining to the last checkpoint are useful in pg_stat_checkpointer.
> What would be the use case for those two values?

I think it's useful to know how long checkpoint has to work. It's a bit
lame to have only one duration (the last one), but at least with this
arrangement you can have external monitoring software connect to the
server, extract that value and save it somewhere else. Monitoring
systems do this all the time, and we've been waiting for a better
implementation to store monitoring data inside Postgres for years. I
think we shouldn't block this proposal just because of this issue,
because it can clearly be useful.

However, I'm not sure I'm very interested in knowing only the duration
of the checkpoint. I mean, much of the time the duration is going to be
whatever fraction of the checkpoint timeout you have as
checkpoint_completion_target, right? Which includes sleeps. So I think
you really want two durations: one is the duration itself, and the other
is what fraction of that did the checkpointer sleep in order to achieve
that duration. So you know how much time checkpointer spent trying to
get the operating system do stuff rather than just sit there waiting.
We already have that data, kinda, in write_time and sync_time, but those
are cumulative rather than just for the last one. (I guess you can have
the monitoring system compute the deltas as it finds each new
checkpoint.) I'm not sure how good this system is.

In the past, I looked at a couple of monitoring dashboards offered by
cloud vendors, searching for anything valuable in terms of checkpoints.
What I saw was very disappointing -- mostly just "how many checkpoints
per minute", which is mostly flat zero with periodic spikes. Totally
useless. Does anybody know if some vendor has good charts for this?
Also, if we were to add this new proposed duration, how could these
charts improve?

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"How strange it is to find the words "Perl" and "saner" in such close
proximity, with no apparent sense of irony. I doubt that Larry himself
could have managed it." (ncm, http://lwn.net/Articles/174769/)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2025-11-24 10:20:56 get rid of Pointer type, mostly
Previous Message Alexander Borisov 2025-11-24 09:55:35 Re: Improve the performance of Unicode Normalization Forms.