Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

From: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Date: 2022-06-13 13:56:39
Message-ID: CAMm1aWYhnsoOgOJ7FPDa_hjriu=0YNz2QFaR6H508MOxTQN4bw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > Have you measured the performance effects of this? On fast storage with large
> > shared_buffers I've seen these loops in profiles. It's probably fine, but it'd
> > be good to verify that.
>
> I am wondering if we could make the function inlined at some point.
> We could also play it safe and only update the counters every N loops
> instead.

The idea looks good but based on the performance numbers shared above,
it is not affecting the performance. So we can use the current
approach as it gives more accurate progress.
---

> > This view is depressingly complicated. Added up the view definitions for
> > the already existing pg_stat_progress* views add up to a measurable part of
> > the size of an empty database:
>
> Yeah. I think that what's proposed could be simplified, and we had
> better remove the fields that are not that useful. First, do we have
> any need for next_flags?

"next_flags" is removed in the v6 patch. Added a "new_requests" field
to get to know whether the current checkpoint is being throttled or
not. Please share your views on this.
---

> Second, is the start LSN really necessary
> for monitoring purposes?

IMO, start LSN is necessary to debug if the checkpoint is taking longer.
---

> Not all the information in the first
> parameter is useful, as well. For example "shutdown" will never be
> seen as it is not possible to use a session at this stage, no?

I understand that "shutdown" and "end-of-recovery" will never be seen
and I have removed it in the v6 patch.
---

> There
> is also no gain in having "immediate", "flush-all", "force" and "wait"
> (for this one if the checkpoint is requested the session doing the
> work knows this information already).

"immediate" is required to understand whether the current checkpoint
is throttled or not. I am not sure about other flags "flush-all",
"force" and "wait". I have just supported all the flags to match the
'checkpoint start' log message. Please share your views. If it is not
really required, I will remove it in the next patch.
---

> A last thing is that we may gain in visibility by having more
> attributes as an effect of splitting param2. On thing that would make
> sense is to track the reason why the checkpoint was triggered
> separately (aka wal and time). Should we use a text[] instead to list
> all the parameters instead? Using a space-separated list of items is
> not intuitive IMO, and callers of this routine will likely parse
> that.

If I understand the above comment correctly, you are saying to
introduce a new field, say "reason" ( possible values are either wal
or time) and the "flags" field will continue to represent the other
flags like "immediate", etc. The idea looks good here. We can
introduce new field "reason" and "flags" field can be renamed to
"throttled" (true/false) if we decide to not support other flags
"flush-all", "force" and "wait".
---

> + WHEN 3 THEN 'checkpointing replication slots'
> + WHEN 4 THEN 'checkpointing logical replication snapshot files'
> + WHEN 5 THEN 'checkpointing logical rewrite mapping files'
> + WHEN 6 THEN 'checkpointing replication origin'
> + WHEN 7 THEN 'checkpointing commit log pages'
> + WHEN 8 THEN 'checkpointing commit time stamp pages'
> There is a lot of "checkpointing" here. All those terms could be
> shorter without losing their meaning.

I will try to make it short in the next patch.
---

Please share your thoughts.

Thanks & Regards,
Nitin Jadhav

On Tue, Apr 5, 2022 at 3:15 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Fri, Mar 18, 2022 at 05:15:56PM -0700, Andres Freund wrote:
> > Have you measured the performance effects of this? On fast storage with large
> > shared_buffers I've seen these loops in profiles. It's probably fine, but it'd
> > be good to verify that.
>
> I am wondering if we could make the function inlined at some point.
> We could also play it safe and only update the counters every N loops
> instead.
>
> > This view is depressingly complicated. Added up the view definitions for
> > the already existing pg_stat_progress* views add up to a measurable part of
> > the size of an empty database:
>
> Yeah. I think that what's proposed could be simplified, and we had
> better remove the fields that are not that useful. First, do we have
> any need for next_flags? Second, is the start LSN really necessary
> for monitoring purposes? Not all the information in the first
> parameter is useful, as well. For example "shutdown" will never be
> seen as it is not possible to use a session at this stage, no? There
> is also no gain in having "immediate", "flush-all", "force" and "wait"
> (for this one if the checkpoint is requested the session doing the
> work knows this information already).
>
> A last thing is that we may gain in visibility by having more
> attributes as an effect of splitting param2. On thing that would make
> sense is to track the reason why the checkpoint was triggered
> separately (aka wal and time). Should we use a text[] instead to list
> all the parameters instead? Using a space-separated list of items is
> not intuitive IMO, and callers of this routine will likely parse
> that.
>
> Shouldn't we also track the number of files flushed in each sub-step?
> In some deployments you could have a large number of 2PC files and
> such. We may want more information on such matters.
>
> + WHEN 3 THEN 'checkpointing replication slots'
> + WHEN 4 THEN 'checkpointing logical replication snapshot files'
> + WHEN 5 THEN 'checkpointing logical rewrite mapping files'
> + WHEN 6 THEN 'checkpointing replication origin'
> + WHEN 7 THEN 'checkpointing commit log pages'
> + WHEN 8 THEN 'checkpointing commit time stamp pages'
> There is a lot of "checkpointing" here. All those terms could be
> shorter without losing their meaning.
>
> This patch still needs some work, so I am marking it as RwF for now.
> --
> Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2022-06-13 13:59:01 Re: Improve TAP tests of pg_upgrade for cross-version tests
Previous Message Nitin Jadhav 2022-06-13 13:38:35 Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)