Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

From: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Date: 2022-03-08 14:55:28
Message-ID: CAMm1aWaFmbc8QCbcPkG7ROV7gjshamS=sJRJd=oghLJM9uOnpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > 11) Can you be specific what are those "some operations" that forced a
> > checkpoint? May be like, basebackup, createdb or something?
> > + The checkpoint is started because some operation forced a checkpoint.
> >
> I will take care in the next patch.

I feel mentioning/listing the specific operation makes it difficult to
maintain the document. If we add any new functionality in future which
needs a force checkpoint, then there is a high chance that we will
miss to update here. Hence modified it to "The checkpoint is started
because some operation (for which the checkpoint is necessary) is
forced the checkpoint".

Fixed other comments as per the discussion above.
Please find the v5 patch attached and share your thoughts.

Thanks & Regards,
Nitin Jadhav

On Mon, Mar 7, 2022 at 7:45 PM Nitin Jadhav
<nitinjadhavpostgres(at)gmail(dot)com> wrote:
>
> > 1) Can we convert below into pgstat_progress_update_multi_param, just
> > to avoid function calls?
> > pgstat_progress_update_param(PROGRESS_CHECKPOINT_LSN, checkPoint.redo);
> > pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE,
> >
> > 2) Why are we not having special phase for CheckPointReplicationOrigin
> > as it does good bunch of work (writing to disk, XLogFlush,
> > durable_rename) especially when max_replication_slots is large?
> >
> > 3) I don't think "requested" is necessary here as it doesn't add any
> > value or it's not a checkpoint kind or such, you can remove it.
> >
> > 4) s:/'recycling old XLOG files'/'recycling old WAL files'
> > + WHEN 16 THEN 'recycling old XLOG files'
> >
> > 5) Can we place CREATE VIEW pg_stat_progress_checkpoint AS definition
> > next to pg_stat_progress_copy in system_view.sql? It looks like all
> > the progress reporting views are next to each other.
>
> I will take care in the next patch.
> ---
>
> > 6) How about shutdown and end-of-recovery checkpoint? Are you planning
> > to have an ereport_startup_progress mechanism as 0002?
>
> I thought of including it earlier then I felt lets first make the
> current patch stable. Once all the fields are properly decided and the
> patch gets in then we can easily extend the functionality to shutdown
> and end-of-recovery cases. I have also observed that the timer
> functionality wont work properly in case of shutdown as we are doing
> an immediate checkpoint. So this needs a lot of discussion and I would
> like to handle this on a separate thread.
> ---
>
> > 7) I think you don't need to call checkpoint_progress_start and
> > pgstat_progress_update_param, any other progress reporting function
> > for shutdown and end-of-recovery checkpoint right?
>
> I had included the guards earlier and then removed later based on the
> discussion upthread.
> ---
>
> > 8) Not for all kinds of checkpoints right? pg_stat_progress_checkpoint
> > can't show progress report for shutdown and end-of-recovery
> > checkpoint, I think you need to specify that here in wal.sgml and
> > checkpoint.sgml.
> > + command <command>CHECKPOINT</command>. The checkpointer process running the
> > + checkpoint will report its progress in the
> > + <structname>pg_stat_progress_checkpoint</structname> view. See
> > + <xref linkend="checkpoint-progress-reporting"/> for details.
> >
> > 9) Can you add a test case for pg_stat_progress_checkpoint view? I
> > think it's good to add one. See, below for reference:
> > -- Add a trigger to catch and print the contents of the catalog view
> > -- pg_stat_progress_copy during data insertion. This allows to test
> > -- the validation of some progress reports for COPY FROM where the trigger
> > -- would fire.
> > create function notice_after_tab_progress_reporting() returns trigger AS
> > $$
> > declare report record;
> >
> > 10) Typo: it's not "is happens"
> > + The checkpoint is happens without delays.
> >
> > 11) Can you be specific what are those "some operations" that forced a
> > checkpoint? May be like, basebackup, createdb or something?
> > + The checkpoint is started because some operation forced a checkpoint.
> >
> > 12) Can you be a bit elobartive here who waits? Something like the
> > backend that requested checkpoint will wait until it's completion ....
> > + Wait for completion before returning.
> >
> > 13) "removing unneeded or flushing needed logical rewrite mapping files"
> > + The checkpointer process is currently removing/flushing the logical
> >
> > 14) "old WAL files"
> > + The checkpointer process is currently recycling old XLOG files.
>
> I will take care in the next patch.
>
> Thanks & Regards,
> Nitin Jadhav
>
> On Wed, Mar 2, 2022 at 11:52 PM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > On Wed, Mar 2, 2022 at 4:45 PM Nitin Jadhav
> > <nitinjadhavpostgres(at)gmail(dot)com> wrote:
> > > > Also, how about special phases for SyncPostCheckpoint(),
> > > > SyncPreCheckpoint(), InvalidateObsoleteReplicationSlots(),
> > > > PreallocXlogFiles() (it currently pre-allocates only 1 WAL file, but
> > > > it might be increase in future (?)), TruncateSUBTRANS()?
> > >
> > > SyncPreCheckpoint() is just incrementing a counter and
> > > PreallocXlogFiles() currently pre-allocates only 1 WAL file. I feel
> > > there is no need to add any phases for these as of now. We can add in
> > > the future if necessary. Added phases for SyncPostCheckpoint(),
> > > InvalidateObsoleteReplicationSlots() and TruncateSUBTRANS().
> >
> > Okay.
> >
> > > > 10) I'm not sure if it's discussed, how about adding the number of
> > > > snapshot/mapping files so far the checkpoint has processed in file
> > > > processing while loops of
> > > > CheckPointSnapBuild/CheckPointLogicalRewriteHeap? Sometimes, there can
> > > > be many logical snapshot or mapping files and users may be interested
> > > > in knowing the so-far-processed-file-count.
> > >
> > > I had thought about this while sharing the v1 patch and mentioned my
> > > views upthread. I feel it won't give meaningful progress information
> > > (It can be treated as statistics). Hence not included. Thoughts?
> >
> > Okay. If there are any complaints about it we can always add them later.
> >
> > > > > > As mentioned upthread, there can be multiple backends that request a
> > > > > > checkpoint, so unless we want to store an array of pid we should store a number
> > > > > > of backend that are waiting for a new checkpoint.
> > > > >
> > > > > Yeah, you are right. Let's not go that path and store an array of
> > > > > pids. I don't see a strong use-case with the pid of the process
> > > > > requesting checkpoint. If required, we can add it later once the
> > > > > pg_stat_progress_checkpoint view gets in.
> > > >
> > > > I don't think that's really necessary to give the pid list.
> > > >
> > > > If you requested a new checkpoint, it doesn't matter if it's only your backend
> > > > that triggered it, another backend or a few other dozen, the result will be the
> > > > same and you have the information that the request has been seen. We could
> > > > store just a bool for that but having a number instead also gives a bit more
> > > > information and may allow you to detect some broken logic on your client code
> > > > if it keeps increasing.
> > >
> > > It's a good metric to show in the view but the information is not
> > > readily available. Additional code is required to calculate the number
> > > of requests. Is it worth doing that? I feel this can be added later if
> > > required.
> >
> > Yes, we can always add it later if required.
> >
> > > Please find the v4 patch attached and share your thoughts.
> >
> > I reviewed v4 patch, here are my comments:
> >
> > 1) Can we convert below into pgstat_progress_update_multi_param, just
> > to avoid function calls?
> > pgstat_progress_update_param(PROGRESS_CHECKPOINT_LSN, checkPoint.redo);
> > pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE,
> >
> > 2) Why are we not having special phase for CheckPointReplicationOrigin
> > as it does good bunch of work (writing to disk, XLogFlush,
> > durable_rename) especially when max_replication_slots is large?
> >
> > 3) I don't think "requested" is necessary here as it doesn't add any
> > value or it's not a checkpoint kind or such, you can remove it.
> >
> > 4) s:/'recycling old XLOG files'/'recycling old WAL files'
> > + WHEN 16 THEN 'recycling old XLOG files'
> >
> > 5) Can we place CREATE VIEW pg_stat_progress_checkpoint AS definition
> > next to pg_stat_progress_copy in system_view.sql? It looks like all
> > the progress reporting views are next to each other.
> >
> > 6) How about shutdown and end-of-recovery checkpoint? Are you planning
> > to have an ereport_startup_progress mechanism as 0002?
> >
> > 7) I think you don't need to call checkpoint_progress_start and
> > pgstat_progress_update_param, any other progress reporting function
> > for shutdown and end-of-recovery checkpoint right?
> >
> > 8) Not for all kinds of checkpoints right? pg_stat_progress_checkpoint
> > can't show progress report for shutdown and end-of-recovery
> > checkpoint, I think you need to specify that here in wal.sgml and
> > checkpoint.sgml.
> > + command <command>CHECKPOINT</command>. The checkpointer process running the
> > + checkpoint will report its progress in the
> > + <structname>pg_stat_progress_checkpoint</structname> view. See
> > + <xref linkend="checkpoint-progress-reporting"/> for details.
> >
> > 9) Can you add a test case for pg_stat_progress_checkpoint view? I
> > think it's good to add one. See, below for reference:
> > -- Add a trigger to catch and print the contents of the catalog view
> > -- pg_stat_progress_copy during data insertion. This allows to test
> > -- the validation of some progress reports for COPY FROM where the trigger
> > -- would fire.
> > create function notice_after_tab_progress_reporting() returns trigger AS
> > $$
> > declare report record;
> >
> > 10) Typo: it's not "is happens"
> > + The checkpoint is happens without delays.
> >
> > 11) Can you be specific what are those "some operations" that forced a
> > checkpoint? May be like, basebackup, createdb or something?
> > + The checkpoint is started because some operation forced a checkpoint.
> >
> > 12) Can you be a bit elobartive here who waits? Something like the
> > backend that requested checkpoint will wait until it's completion ....
> > + Wait for completion before returning.
> >
> > 13) "removing unneeded or flushing needed logical rewrite mapping files"
> > + The checkpointer process is currently removing/flushing the logical
> >
> > 14) "old WAL files"
> > + The checkpointer process is currently recycling old XLOG files.
> >
> > Regards,
> > Bharath Rupireddy.

Attachment Content-Type Size
v5-0001-pg_stat_progress_checkpoint-view.patch application/octet-stream 38.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nitin Jadhav 2022-03-08 15:00:49 Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Previous Message Dong Wook Lee 2022-03-08 14:54:12 Re: Add pg_freespacemap extension sql test