Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Date: 2022-03-02 18:22:31
Message-ID: CALj2ACXsmvEDsHtADoQeDE-NVyBh4A-D80NmNanTUc37g=FQ-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 2, 2022 at 4:45 PM Nitin Jadhav
<nitinjadhavpostgres(at)gmail(dot)com> wrote:
> > Also, how about special phases for SyncPostCheckpoint(),
> > SyncPreCheckpoint(), InvalidateObsoleteReplicationSlots(),
> > PreallocXlogFiles() (it currently pre-allocates only 1 WAL file, but
> > it might be increase in future (?)), TruncateSUBTRANS()?
>
> SyncPreCheckpoint() is just incrementing a counter and
> PreallocXlogFiles() currently pre-allocates only 1 WAL file. I feel
> there is no need to add any phases for these as of now. We can add in
> the future if necessary. Added phases for SyncPostCheckpoint(),
> InvalidateObsoleteReplicationSlots() and TruncateSUBTRANS().

Okay.

> > 10) I'm not sure if it's discussed, how about adding the number of
> > snapshot/mapping files so far the checkpoint has processed in file
> > processing while loops of
> > CheckPointSnapBuild/CheckPointLogicalRewriteHeap? Sometimes, there can
> > be many logical snapshot or mapping files and users may be interested
> > in knowing the so-far-processed-file-count.
>
> I had thought about this while sharing the v1 patch and mentioned my
> views upthread. I feel it won't give meaningful progress information
> (It can be treated as statistics). Hence not included. Thoughts?

Okay. If there are any complaints about it we can always add them later.

> > > > As mentioned upthread, there can be multiple backends that request a
> > > > checkpoint, so unless we want to store an array of pid we should store a number
> > > > of backend that are waiting for a new checkpoint.
> > >
> > > Yeah, you are right. Let's not go that path and store an array of
> > > pids. I don't see a strong use-case with the pid of the process
> > > requesting checkpoint. If required, we can add it later once the
> > > pg_stat_progress_checkpoint view gets in.
> >
> > I don't think that's really necessary to give the pid list.
> >
> > If you requested a new checkpoint, it doesn't matter if it's only your backend
> > that triggered it, another backend or a few other dozen, the result will be the
> > same and you have the information that the request has been seen. We could
> > store just a bool for that but having a number instead also gives a bit more
> > information and may allow you to detect some broken logic on your client code
> > if it keeps increasing.
>
> It's a good metric to show in the view but the information is not
> readily available. Additional code is required to calculate the number
> of requests. Is it worth doing that? I feel this can be added later if
> required.

Yes, we can always add it later if required.

> Please find the v4 patch attached and share your thoughts.

I reviewed v4 patch, here are my comments:

1) Can we convert below into pgstat_progress_update_multi_param, just
to avoid function calls?
pgstat_progress_update_param(PROGRESS_CHECKPOINT_LSN, checkPoint.redo);
pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE,

2) Why are we not having special phase for CheckPointReplicationOrigin
as it does good bunch of work (writing to disk, XLogFlush,
durable_rename) especially when max_replication_slots is large?

3) I don't think "requested" is necessary here as it doesn't add any
value or it's not a checkpoint kind or such, you can remove it.

4) s:/'recycling old XLOG files'/'recycling old WAL files'
+ WHEN 16 THEN 'recycling old XLOG files'

5) Can we place CREATE VIEW pg_stat_progress_checkpoint AS definition
next to pg_stat_progress_copy in system_view.sql? It looks like all
the progress reporting views are next to each other.

6) How about shutdown and end-of-recovery checkpoint? Are you planning
to have an ereport_startup_progress mechanism as 0002?

7) I think you don't need to call checkpoint_progress_start and
pgstat_progress_update_param, any other progress reporting function
for shutdown and end-of-recovery checkpoint right?

8) Not for all kinds of checkpoints right? pg_stat_progress_checkpoint
can't show progress report for shutdown and end-of-recovery
checkpoint, I think you need to specify that here in wal.sgml and
checkpoint.sgml.
+ command <command>CHECKPOINT</command>. The checkpointer process running the
+ checkpoint will report its progress in the
+ <structname>pg_stat_progress_checkpoint</structname> view. See
+ <xref linkend="checkpoint-progress-reporting"/> for details.

9) Can you add a test case for pg_stat_progress_checkpoint view? I
think it's good to add one. See, below for reference:
-- Add a trigger to catch and print the contents of the catalog view
-- pg_stat_progress_copy during data insertion. This allows to test
-- the validation of some progress reports for COPY FROM where the trigger
-- would fire.
create function notice_after_tab_progress_reporting() returns trigger AS
$$
declare report record;

10) Typo: it's not "is happens"
+ The checkpoint is happens without delays.

11) Can you be specific what are those "some operations" that forced a
checkpoint? May be like, basebackup, createdb or something?
+ The checkpoint is started because some operation forced a checkpoint.

12) Can you be a bit elobartive here who waits? Something like the
backend that requested checkpoint will wait until it's completion ....
+ Wait for completion before returning.

13) "removing unneeded or flushing needed logical rewrite mapping files"
+ The checkpointer process is currently removing/flushing the logical

14) "old WAL files"
+ The checkpointer process is currently recycling old XLOG files.

Regards,
Bharath Rupireddy.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2022-03-02 19:07:12 Re: [PATCH] Add reloption for views to enable RLS
Previous Message Pavel Stehule 2022-03-02 18:08:12 Re: [Proposal] Global temporary tables