Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

From: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Julien Rouhaud <rjuju123(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Date: 2022-02-25 14:56:27
Message-ID: CAMm1aWbSC091NH1wRKZo=tO-F2WoNuYptpmP1bXEyyX+7YQvvw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Thank you Alvaro and Matthias for your views. I understand your point
> of not updating the progress-report flag here as it just checks
> whether the CHECKPOINT_IMMEDIATE is set or not and takes an action
> based on that but it doesn't change the checkpoint flags. I will
> modify the code but I am a bit confused here. As per Alvaro, we need
> to make the progress-report flag change in whatever is the place that
> *requests* an immediate checkpoint. I feel this gives information
> about the upcoming checkpoint not the current one. So updating here
> provides wrong details in the view. The flags available during
> CreateCheckPoint() will remain same for the entire checkpoint
> operation and we should show the same information in the view till it
> completes. So just removing the above piece of code (modified in
> ImmediateCheckpointRequested()) in the patch will make it correct. My
> opinion about maintaining a separate field to show upcoming checkpoint
> flags is it makes the view complex. Please share your thoughts.

I have modified the code accordingly.
---

> I think the use of capitals in CHECKPOINT and CHECKPOINTER in the
> documentation is excessive.

Fixed. Here the word CHECKPOINT represents command/checkpoint
operation. If we treat it as a checkpoint operation, I agree to use
lowercase but if we treat it as command, then I think uppercase is
recommended (Refer
https://www.postgresql.org/docs/14/sql-checkpoint.html). Is it ok to
always use lowercase here?
---

> (Same for terms such as MULTIXACT and
> others in those docs; we typically use those in lowercase when
> user-facing; and do we really use term CLOG anymore? Don't we call it
> "commit log" nowadays?)

I have observed the CLOG term in the existing documentation. Anyways I
have changed MULTIXACT to multixact, SUBTRANS to subtransaction and
CLOG to commit log.
---

> + Whenever the checkpoint operation is running, the
> + <structname>pg_stat_progress_checkpoint</structname> view will contain a
> + single row indicating the progress of the checkpoint. The tables below
>
> Maybe it should show a single row , unless the checkpointer isn't running at
> all (like in single user mode).

Nice thought. Can we add an additional checkpoint phase like 'Idle'.
Idle is ON whenever the checkpointer process is running and there are
no on-going checkpoint Thoughts?
---

> + Process ID of a CHECKPOINTER process.
>
> It's *the* checkpointer process.

Fixed.
---

> pgstatfuncs.c has a whitespace issue (tab-space).

I have verified with 'git diff --check' and also manually. I did not
find any issue. Kindly mention the specific code which has an issue.
---

> I suppose the functions should set provolatile.

Fixed.
---

> > I am storing the checkpoint start timestamp in the st_progress_param[]
> > and this gets set only once during the checkpoint (at the start of the
> > checkpoint). I have added function
> > pg_stat_get_progress_checkpoint_elapsed() which calculates the elapsed
> > time and returns a string. This function gets called whenever
> > pg_stat_progress_checkpoint view is queried. Kindly refer v2 patch and
> > share your thoughts.
>
> I dislike the lack of access to the actual value of the checkpoint
> start / checkpoint elapsed field.
>
> As a user, if I query the pg_stat_progress_* views, my terminal or
> application can easily interpret an `interval` value and cast it to
> string, but the opposite is not true: the current implementation for
> pg_stat_get_progress_checkpoint_elapsed loses precision. This is why
> we use typed numeric fields in effectively all other places instead of
> stringified versions of the values: oid fields, counters, etc are all
> rendered as bigint in the view, so that no information is lost and
> interpretation is trivial.

Displaying start time of the checkpoint.
---

> > I understand that the log based reporting is very costly and very
> > frequent updates are not advisable. I am planning to use the existing
> > infrastructure of 'log_startup_progress_interval' which provides an
> > option for the user to configure the interval between each progress
> > update. Hence it avoids frequent updates to server logs. This approach
> > is used only during shutdown and end-of-recovery cases because we
> > cannot access pg_stat_progress_checkpoint view during those scenarios.
>
> I see; but log_startup_progress_interval seems to be exclusively
> consumed through the ereport_startup_progress macro. Why put
> startup/shutdown logging on the same path as the happy flow of normal
> checkpoints?

You mean to say while updating the progress of the checkpoint, call
pgstat_progress_update_param() and then call
ereport_startup_progress() ?

> I think that, instead of looking to what might at some point be added,
> it is better to use the currently available functions instead, and
> move to new functions if and when the log-based reporting requires it.

Make sense. Removing checkpoint_progress_update_param() and
checkpoint_progress_end(). I would like to concentrate on
pg_stat_progress_checkpoint view as of now and I will consider log
based reporting later.

> > diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
>
> With the new pg_stat_progress_checkpoint, you should also add a
> backreference to this progress reporting in the CHECKPOINT sql command
> documentation located in checkpoint.sgml, and maybe in wal.sgml and/or
> backup.sgml too. See e.g. cluster.sgml around line 195 for an example.

I have updated in checkpoint.sqml and wal.sqml.

> > diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
> > + ( SELECT '0/0'::pg_lsn +
> > + ((CASE
> > + WHEN stat.lsn_int64 < 0 THEN pow(2::numeric, 64::numeric)::numeric
> > + ELSE 0::numeric
> > + END) +
> > + stat.lsn_int64::numeric)
> > + FROM (SELECT s.param3::bigint) AS stat(lsn_int64)
> > + ) AS start_lsn,
>
> My LSN select statement was an example that could be run directly in
> psql; the so you didn't have to embed the SELECT into the view query.
> The following should be sufficient (and save the planner a few cycles
> otherwise spent in inlining):
>
> + ('0/0'::pg_lsn +
> + ((CASE
> + WHEN s.param3 < 0 THEN pow(2::numeric,
> 64::numeric)::numeric
> + ELSE 0::numeric
> + END) +
> + s.param3::numeric)
> + ) AS start_lsn,

Thanks for the suggestion. Fixed.

> > diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
> > +checkpoint_progress_start(int flags)
> > [...]
> > +checkpoint_progress_update_param(int index, int64 val)
> > [...]
> > +checkpoint_progress_end(void)
> > +{
> > + /* In bootstrap mode, we don't actually record anything. */
> > + if (IsBootstrapProcessingMode())
> > + return;
>
> Disabling pgstat progress reporting when in bootstrap processing mode
> / startup/end-of-recovery makes very little sense (see upthread) and
> should be removed, regardless of whether seperate functions stay.

Removed since log based reporting is not part of the current patch.

> > diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
> > +#define PROGRESS_CHECKPOINT_PHASE_INIT 0
>
> Generally, enum-like values in a stat_progress field are 1-indexed, to
> differentiate between empty/uninitialized (0) and states that have
> been set by the progress reporting infrastructure.

Fixed.

Please find the v3 patch attached and share your thoughts.

Thanks & Regards,
Nitin Jadhav
On Fri, Feb 25, 2022 at 12:23 AM Nitin Jadhav
<nitinjadhavpostgres(at)gmail(dot)com> wrote:
>
> > I think the change to ImmediateCheckpointRequested() makes no sense.
> > Before this patch, that function merely inquires whether there's an
> > immediate checkpoint queued. After this patch, it ... changes a
> > progress-reporting flag? I think it would make more sense to make the
> > progress-report flag change in whatever is the place that *requests* an
> > immediate checkpoint rather than here.
> >
> > > diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
> > > +ImmediateCheckpointRequested(int flags)
> > > if (cps->ckpt_flags & CHECKPOINT_IMMEDIATE)
> > > + {
> > > + updated_flags |= CHECKPOINT_IMMEDIATE;
> >
> > I don't think that these changes are expected behaviour. Under in this
> > condition; the currently running checkpoint is still not 'immediate',
> > but it is going to hurry up for a new, actually immediate checkpoint.
> > Those are different kinds of checkpoint handling; and I don't think
> > you should modify the reported flags to show that we're going to do
> > stuff faster than usual. Maybe maintiain a seperate 'upcoming
> > checkpoint flags' field instead?
>
> Thank you Alvaro and Matthias for your views. I understand your point
> of not updating the progress-report flag here as it just checks
> whether the CHECKPOINT_IMMEDIATE is set or not and takes an action
> based on that but it doesn't change the checkpoint flags. I will
> modify the code but I am a bit confused here. As per Alvaro, we need
> to make the progress-report flag change in whatever is the place that
> *requests* an immediate checkpoint. I feel this gives information
> about the upcoming checkpoint not the current one. So updating here
> provides wrong details in the view. The flags available during
> CreateCheckPoint() will remain same for the entire checkpoint
> operation and we should show the same information in the view till it
> completes. So just removing the above piece of code (modified in
> ImmediateCheckpointRequested()) in the patch will make it correct. My
> opinion about maintaining a separate field to show upcoming checkpoint
> flags is it makes the view complex. Please share your thoughts.
>
> Thanks & Regards,
>
> On Thu, Feb 24, 2022 at 10:45 PM Matthias van de Meent
> <boekewurm+postgres(at)gmail(dot)com> wrote:
> >
> > On Wed, 23 Feb 2022 at 14:28, Nitin Jadhav
> > <nitinjadhavpostgres(at)gmail(dot)com> wrote:
> > >
> > > Sharing the v2 patch. Kindly have a look and share your comments.
> >
> > Thanks for updating.
> >
> > > diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
> >
> > With the new pg_stat_progress_checkpoint, you should also add a
> > backreference to this progress reporting in the CHECKPOINT sql command
> > documentation located in checkpoint.sgml, and maybe in wal.sgml and/or
> > backup.sgml too. See e.g. cluster.sgml around line 195 for an example.
> >
> > > diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
> > > +ImmediateCheckpointRequested(int flags)
> > > if (cps->ckpt_flags & CHECKPOINT_IMMEDIATE)
> > > + {
> > > + updated_flags |= CHECKPOINT_IMMEDIATE;
> >
> > I don't think that these changes are expected behaviour. Under in this
> > condition; the currently running checkpoint is still not 'immediate',
> > but it is going to hurry up for a new, actually immediate checkpoint.
> > Those are different kinds of checkpoint handling; and I don't think
> > you should modify the reported flags to show that we're going to do
> > stuff faster than usual. Maybe maintiain a seperate 'upcoming
> > checkpoint flags' field instead?
> >
> > > diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
> > > + ( SELECT '0/0'::pg_lsn +
> > > + ((CASE
> > > + WHEN stat.lsn_int64 < 0 THEN pow(2::numeric, 64::numeric)::numeric
> > > + ELSE 0::numeric
> > > + END) +
> > > + stat.lsn_int64::numeric)
> > > + FROM (SELECT s.param3::bigint) AS stat(lsn_int64)
> > > + ) AS start_lsn,
> >
> > My LSN select statement was an example that could be run directly in
> > psql; the so you didn't have to embed the SELECT into the view query.
> > The following should be sufficient (and save the planner a few cycles
> > otherwise spent in inlining):
> >
> > + ('0/0'::pg_lsn +
> > + ((CASE
> > + WHEN s.param3 < 0 THEN pow(2::numeric,
> > 64::numeric)::numeric
> > + ELSE 0::numeric
> > + END) +
> > + s.param3::numeric)
> > + ) AS start_lsn,
> >
> >
> > > diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
> > > +checkpoint_progress_start(int flags)
> > > [...]
> > > +checkpoint_progress_update_param(int index, int64 val)
> > > [...]
> > > +checkpoint_progress_end(void)
> > > +{
> > > + /* In bootstrap mode, we don't actually record anything. */
> > > + if (IsBootstrapProcessingMode())
> > > + return;
> >
> > Disabling pgstat progress reporting when in bootstrap processing mode
> > / startup/end-of-recovery makes very little sense (see upthread) and
> > should be removed, regardless of whether seperate functions stay.
> >
> > > diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
> > > +#define PROGRESS_CHECKPOINT_PHASE_INIT 0
> >
> > Generally, enum-like values in a stat_progress field are 1-indexed, to
> > differentiate between empty/uninitialized (0) and states that have
> > been set by the progress reporting infrastructure.
> >
> >
> >
> > Kind regards,
> >
> > Matthias van de Meent

Attachment Content-Type Size
v3-0001-pg_stat_progress_checkpoint-view.patch application/octet-stream 32.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabrízio de Royes Mello 2022-02-25 14:58:48 Size functions inconsistent results
Previous Message Greg Stark 2022-02-25 14:52:49 Re: Design of pg_stat_subscription_workers vs pgstats