Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
Cc: Julien Rouhaud <rjuju123(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Date: 2022-03-04 12:20:47
Message-ID: CAE9k0Pn5VK4et99DkXg-gpyTbFjcfVR5oVdGNsjxcpN=mPFqZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Please don't mix comments from multiple reviewers into one thread.
It's hard to understand which comments are mine or Julien's or from
others. Can you please respond to the email from each of us separately
with an inline response. That will be helpful to understand your
thoughts on our review comments.

--
With Regards,
Ashutosh Sharma.

On Fri, Mar 4, 2022 at 4:59 PM Nitin Jadhav
<nitinjadhavpostgres(at)gmail(dot)com> wrote:
>
> Thanks for reviewing.
>
> > 6) How about shutdown and end-of-recovery checkpoint? Are you planning
> > to have an ereport_startup_progress mechanism as 0002?
>
> I thought of including it earlier then I felt lets first make the
> current patch stable. Once all the fields are properly decided and the
> patch gets in then we can easily extend the functionality to shutdown
> and end-of-recovery cases. I have also observed that the timer
> functionality wont work properly in case of shutdown as we are doing
> an immediate checkpoint. So this needs a lot of discussion and I would
> like to handle this on a separate thread.
> ---
>
> > 7) I think you don't need to call checkpoint_progress_start and
> > pgstat_progress_update_param, any other progress reporting function
> > for shutdown and end-of-recovery checkpoint right?
>
> I had included the guards earlier and then removed later based on the
> discussion upthread.
> ---
>
> > [local]:5432 ashu(at)postgres=# select * from pg_stat_progress_checkpoint;
> > -[ RECORD 1 ]-----+-------------------------------------
> > pid | 22043
> > type | checkpoint
> > kind | immediate force wait requested time
> > start_lsn | 0/14C60F8
> > start_time | 2022-03-03 18:59:56.018662+05:30
> > phase | performing two phase checkpoint
> >
> >
> > This is the output I see when the checkpointer process has come out of
> > the two phase checkpoint and is currently writing checkpoint xlog
> > records and doing other stuff like updating control files etc. Is this
> > okay?
>
> The idea behind choosing the phases is based on the functionality
> which takes longer time to execute. Since after two phase checkpoint
> till post checkpoint cleanup won't take much time to execute, I have
> not added any additional phase for that. But I also agree that this
> gives wrong information to the user. How about mentioning the phase
> information at the end of each phase like "Initializing",
> "Initialization done", ..., "two phase checkpoint done", "post
> checkpoint cleanup done", .., "finalizing". Except for the first phase
> ("initializing") and last phase ("finalizing"), all the other phases
> describe the end of a certain operation. I feel this gives correct
> information even though the phase name/description does not represent
> the entire code block between two phases. For example if the current
> phase is ''two phase checkpoint done". Then the user can infer that
> the checkpointer has done till two phase checkpoint and it is doing
> other stuff that are after that. Thoughts?
>
> > The output of log_checkpoint shows the number of buffers written is 3
> > whereas the output of pg_stat_progress_checkpoint shows it as 0. See
> > below:
> >
> > 2022-03-03 20:04:45.643 IST [22043] LOG: checkpoint complete: wrote 3
> > buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled;
> > write=24.652 s, sync=104.256 s, total=3889.625 s; sync files=2,
> > longest=0.011 s, average=0.008 s; distance=0 kB, estimate=0 kB
> >
> > --
> >
> > [local]:5432 ashu(at)postgres=# select * from pg_stat_progress_checkpoint;
> > -[ RECORD 1 ]-----+-------------------------------------
> > pid | 22043
> > type | checkpoint
> > kind | immediate force wait requested time
> > start_lsn | 0/14C60F8
> > start_time | 2022-03-03 18:59:56.018662+05:30
> > phase | finalizing
> > buffers_total | 0
> > buffers_processed | 0
> > buffers_written | 0
> >
> > Any idea why this mismatch?
>
> Good catch. In BufferSync() we have 'num_to_scan' (buffers_total)
> which indicates the total number of buffers to be processed. Based on
> that, the 'buffers_processed' and 'buffers_written' counter gets
> incremented. I meant these values may reach upto 'buffers_total'. The
> current pg_stat_progress_view support above information. There is
> another place when 'ckpt_bufs_written' gets incremented (In
> SlruInternalWritePage()). This increment is above the 'buffers_total'
> value and it is included in the server log message (checkpoint end)
> and not included in the view. I am a bit confused here. If we include
> this increment in the view then we cannot calculate the exact
> 'buffers_total' beforehand. Can we increment the 'buffers_toal' also
> when 'ckpt_bufs_written' gets incremented so that we can match the
> behaviour with checkpoint end message? Please share your thoughts.
> ---
>
> > I think we can add a couple of more information to this view -
> > start_time for buffer write operation and start_time for buffer sync
> > operation. These are two very time consuming tasks in a checkpoint and
> > people would find it useful to know how much time is being taken by
> > the checkpoint in I/O operation phase. thoughts?
>
> I felt the detailed progress is getting shown for these 2 phases of
> the checkpoint like 'buffers_processed', 'buffers_written' and
> 'files_synced'. Hence I did not think about adding start time and If
> it is really required, then I can add.
>
> > Is it that hard or costly to do? Just sending a message to increment
> > the stat counter in RequestCheckpoint() would be enough.
> >
> > Also, unless I'm missing something it's still only showing the initial
> > checkpoint flags, so it's *not* showing what the checkpoint is really
> > doing, only what the checkpoint may be doing if nothing else happens.
> > It just feels wrong. You could even use that ckpt_flags info to know
> > that at least one backend has requested a new checkpoint, if you don't
> > want to have a number of backends.
>
> I think using ckpt_flags to display whether any new requests have been
> made or not is a good idea. I will include it in the next patch.
>
> Thanks & Regards,
> Nitin Jadhav
> On Thu, Mar 3, 2022 at 11:58 PM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> >
> > On Wed, Mar 2, 2022 at 7:15 PM Nitin Jadhav
> > <nitinjadhavpostgres(at)gmail(dot)com> wrote:
> > >
> > > > > > As mentioned upthread, there can be multiple backends that request a
> > > > > > checkpoint, so unless we want to store an array of pid we should store a number
> > > > > > of backend that are waiting for a new checkpoint.
> > >
> > > It's a good metric to show in the view but the information is not
> > > readily available. Additional code is required to calculate the number
> > > of requests. Is it worth doing that? I feel this can be added later if
> > > required.
> >
> > Is it that hard or costly to do? Just sending a message to increment
> > the stat counter in RequestCheckpoint() would be enough.
> >
> > Also, unless I'm missing something it's still only showing the initial
> > checkpoint flags, so it's *not* showing what the checkpoint is really
> > doing, only what the checkpoint may be doing if nothing else happens.
> > It just feels wrong. You could even use that ckpt_flags info to know
> > that at least one backend has requested a new checkpoint, if you don't
> > want to have a number of backends.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Sharma 2022-03-04 12:25:08 Re: pg_walinspect - a new extension to get raw WAL data and WAL stats
Previous Message Justin Pryzby 2022-03-04 11:44:06 Re: wal_compression=zstd