Quick Links

Re: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer

From:	Michael Banck <mbanck(at)gmx(dot)net>
To:	Álvaro Herrera <alvherre(at)kurilemu(dot)de>
Cc:	Soumya S Murali <soumyamurali(dot)work(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, melanieplageman(at)gmail(dot)com
Subject:	Re: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer
Date:	2025-11-24 11:05:41
Message-ID:	20251124110541.GD8364@p46.dedyn.io;lightning.p46.dedyn.io
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On Mon, Nov 24, 2025 at 11:07:41AM +0100, Álvaro Herrera wrote:
> On 2025-Nov-24, Michael Banck wrote:
>
> > In general I doubt how much those gauges (as oppposed to counters) only
> > pertaining to the last checkpoint are useful in pg_stat_checkpointer.
> > What would be the use case for those two values?
>
> I think it's useful to know how long checkpoint has to work. It's a bit
> lame to have only one duration (the last one), but at least with this
> arrangement you can have external monitoring software connect to the
> server, extract that value and save it somewhere else. Monitoring
> systems do this all the time, and we've been waiting for a better
> implementation to store monitoring data inside Postgres for years. I
> think we shouldn't block this proposal just because of this issue,
> because it can clearly be useful.

I don't know - what happens if the monitoring systems reads those values
every minute, but then suddenly Postgres checkpoints every 20 seconds
due to a traffic spike? It would just not see those additional
checkpoints in this case, no?

What monitoring systems do (have to do) is query write_time + sync_time
as total_time in pg_stat_checkpointer and store that along with the
timestamp of the query. Then you (maybe awkwardly) generate a graph of
the checkpoint durations over time.

> However, I'm not sure I'm very interested in knowing only the duration
> of the checkpoint. I mean, much of the time the duration is going to be
> whatever fraction of the checkpoint timeout you have as
> checkpoint_completion_target, right? Which includes sleeps.

Yeah, that is the other thing I was wondering about, but did not mention
in my mail, good point.

> So I think you really want two durations: one is the duration itself,
> and the other is what fraction of that did the checkpointer sleep in
> order to achieve that duration. So you know how much time
> checkpointer spent trying to get the operating system do stuff rather
> than just sit there waiting. We already have that data, kinda, in
> write_time and sync_time, but those are cumulative rather than just
> for the last one.

I think that we either have "last timestamp whatever" or "total", but I
think we don't have "last duration" anywhere?

> (I guess you can have the monitoring system compute
> the deltas as it finds each new checkpoint.) I'm not sure how good
> this system is.

Right, this is what I meant above. But from what I see on PG18,
total_time just seems tbe write_time + sync_time, do we have the sleep
somewhere?

> In the past, I looked at a couple of monitoring dashboards offered by
> cloud vendors, searching for anything valuable in terms of checkpoints.
> What I saw was very disappointing -- mostly just "how many checkpoints
> per minute", which is mostly flat zero with periodic spikes. Totally
> useless. Does anybody know if some vendor has good charts for this?
> Also, if we were to add this new proposed duration, how could these
> charts improve?

I don't have a good answer here.

Michael

In response to

Re: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer at 2025-11-24 10:07:41 from Álvaro Herrera

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Dave Page	2025-11-24 11:33:12	Re: Build failure with Meson >= 1.8.3 on Windows
Previous Message	Nazir Bilal Yavuz	2025-11-24 11:04:01	Re: BUG #19095: Test if function exit() is used fail when linked static