Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Date: 2022-04-05 09:45:44
Message-ID: YkwPyAoTq/KmTX2B@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 18, 2022 at 05:15:56PM -0700, Andres Freund wrote:
> Have you measured the performance effects of this? On fast storage with large
> shared_buffers I've seen these loops in profiles. It's probably fine, but it'd
> be good to verify that.

I am wondering if we could make the function inlined at some point.
We could also play it safe and only update the counters every N loops
instead.

> This view is depressingly complicated. Added up the view definitions for
> the already existing pg_stat_progress* views add up to a measurable part of
> the size of an empty database:

Yeah. I think that what's proposed could be simplified, and we had
better remove the fields that are not that useful. First, do we have
any need for next_flags? Second, is the start LSN really necessary
for monitoring purposes? Not all the information in the first
parameter is useful, as well. For example "shutdown" will never be
seen as it is not possible to use a session at this stage, no? There
is also no gain in having "immediate", "flush-all", "force" and "wait"
(for this one if the checkpoint is requested the session doing the
work knows this information already).

A last thing is that we may gain in visibility by having more
attributes as an effect of splitting param2. On thing that would make
sense is to track the reason why the checkpoint was triggered
separately (aka wal and time). Should we use a text[] instead to list
all the parameters instead? Using a space-separated list of items is
not intuitive IMO, and callers of this routine will likely parse
that.

Shouldn't we also track the number of files flushed in each sub-step?
In some deployments you could have a large number of 2PC files and
such. We may want more information on such matters.

+ WHEN 3 THEN 'checkpointing replication slots'
+ WHEN 4 THEN 'checkpointing logical replication snapshot files'
+ WHEN 5 THEN 'checkpointing logical rewrite mapping files'
+ WHEN 6 THEN 'checkpointing replication origin'
+ WHEN 7 THEN 'checkpointing commit log pages'
+ WHEN 8 THEN 'checkpointing commit time stamp pages'
There is a lot of "checkpointing" here. All those terms could be
shorter without losing their meaning.

This patch still needs some work, so I am marking it as RwF for now.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2022-04-05 09:49:10 Re: Frontend error logging style
Previous Message Bharath Rupireddy 2022-04-05 09:42:09 Re: Extensible Rmgr for Table AMs