Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

From: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Date: 2022-02-22 06:38:33
Message-ID: CAMm1aWZzPVpLs7k5sQU4DObvh6YpOhjibcq2ykDrC_n25jeJNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > Thank you for sharing the information. 'triggering backend PID' (int)
> > - can be stored without any problem. 'checkpoint or restartpoint?'
> > (boolean) - can be stored as a integer value like
> > PROGRESS_CHECKPOINT_TYPE_CHECKPOINT(0) and
> > PROGRESS_CHECKPOINT_TYPE_RESTARTPOINT(1). 'elapsed time' (store as
> > start time in stat_progress, timestamp fits in 64 bits) - As
> > Timestamptz is of type int64 internally, so we can store the timestamp
> > value in the progres parameter and then expose a function like
> > 'pg_stat_get_progress_checkpoint_elapsed' which takes int64 (not
> > Timestamptz) as argument and then returns string representing the
> > elapsed time.
>
> No need to use a string there; I think exposing the checkpoint start
> time is good enough. The conversion of int64 to timestamp[tz] can be
> done in SQL (although I'm not sure that exposing the internal bitwise
> representation of Interval should be exposed to that extent) [0].
> Users can then extract the duration interval using now() - start_time,
> which also allows the user to use their own preferred formatting.

The reason for showing the elapsed time rather than exposing the
timestamp directly is in case of checkpoint during shutdown and
end-of-recovery, I am planning to log a message in server logs using
'log_startup_progress_interval' infrastructure which displays elapsed
time. So just to match both of the behaviour I am displaying elapsed
time here. I feel that elapsed time gives a quicker feel of the
progress. Kindly let me know if you still feel just exposing the
timestamp is better than showing the elapsed time.

> > 'checkpoint start location' (lsn = uint64) - I feel we
> > cannot use progress parameters for this case. As assigning uint64 to
> > int64 type would be an issue for larger values and can lead to hidden
> > bugs.
>
> Not necessarily - we can (without much trouble) do a bitwise cast from
> uint64 to int64, and then (in SQL) cast it back to a pg_lsn [1]. Not
> very elegant, but it works quite well.
>
> [1] SELECT '0/0'::pg_lsn + ((CASE WHEN stat.my_int64 < 0 THEN
> pow(2::numeric, 64::numeric)::numeric ELSE 0::numeric END) +
> stat.my_int64::numeric) FROM (SELECT -2::bigint /* 0xFFFFFFFF/FFFFFFFE
> */ AS my_bigint_lsn) AS stat(my_int64);

Thanks for sharing. It works. I will include this in the next patch.
On Sat, Feb 19, 2022 at 11:02 AM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Fri, Feb 18, 2022 at 08:07:05PM +0530, Nitin Jadhav wrote:
> >
> > The backend_pid contains a valid value only during
> > the CHECKPOINT command issued by the backend explicitly, otherwise the
> > value will be 0. We may have to add an additional field to
> > 'CheckpointerShmemStruct' to hold the backend pid. The backend
> > requesting the checkpoint will update its pid to this structure.
> > Kindly let me know if you still feel the backend_pid field is not
> > necessary.
>
> There are more scenarios where you can have a baackend requesting a checkpoint
> and waiting for its completion, and there may be more than one backend
> concerned, so I don't think that storing only one / the first backend pid is
> ok.
>
> > > And also while looking at the patch I see there's the same problem that I
> > > mentioned in the previous thread, which is that the effective flags can be
> > > updated once the checkpoint started, and as-is the view won't reflect that. It
> > > also means that you can't simply display one of wal, time or force but a
> > > possible combination of the flags (including the one not handled in v1).
> >
> > If I understand the above comment properly, it has 2 points. First is
> > to display the combination of flags rather than just displaying wal,
> > time or force - The idea behind this is to just let the user know the
> > reason for checkpointing. That is, the checkpoint is started because
> > max_wal_size is reached or checkpoint_timeout expired or explicitly
> > issued CHECKPOINT command. The other flags like CHECKPOINT_IMMEDIATE,
> > CHECKPOINT_WAIT or CHECKPOINT_FLUSH_ALL indicate how the checkpoint
> > has to be performed. Hence I have not included those in the view. If
> > it is really required, I would like to modify the code to include
> > other flags and display the combination.
>
> I think all the information should be exposed. Only knowing why the current
> checkpoint has been triggered without any further information seems a bit
> useless. Think for instance for cases like [1].
>
> > Second point is to reflect
> > the updated flags in the view. AFAIK, there is a possibility that the
> > flags get updated during the on-going checkpoint but the reason for
> > checkpoint (wal, time or force) will remain same for the current
> > checkpoint. There might be a change in how checkpoint has to be
> > performed if CHECKPOINT_IMMEDIATE flag is set. If we go with
> > displaying the combination of flags in the view, then probably we may
> > have to reflect this in the view.
>
> You can only "upgrade" a checkpoint, but not "downgrade" it. So if for
> instance you find both CHECKPOINT_CAUSE_TIME and CHECKPOINT_FORCE (which is
> possible) you can easily know which one was the one that triggered the
> checkpoint and which one was added later.
>
> > > > Probably a new field named 'processes_wiating' or 'events_waiting' can be
> > > > added for this purpose.
> > >
> > > Maybe num_process_waiting?
> >
> > I feel 'processes_wiating' aligns more with the naming conventions of
> > the fields of the existing progres views.
>
> There's at least pg_stat_progress_vacuum.num_dead_tuples. Anyway I don't have
> a strong opinion on it, just make sure to correct the typo.
>
> > > > Probably writing of buffers or syncing files may complete before
> > > > pg_is_in_recovery() returns false. But there are some cleanup
> > > > operations happen as part of the checkpoint. During this scenario, we
> > > > may get false value for pg_is_in_recovery(). Please refer following
> > > > piece of code which is present in CreateRestartpoint().
> > > >
> > > > if (!RecoveryInProgress())
> > > > replayTLI = XLogCtl->InsertTimeLineID;
> > >
> > > Then maybe we could store the timeline rather then then kind of checkpoint?
> > > You should still be able to compute the information while giving a bit more
> > > information for the same memory usage.
> >
> > Can you please describe more about how checkpoint/restartpoint can be
> > confirmed using the timeline id.
>
> If pg_is_in_recovery() is true, then it's a restartpoint, otherwise it's a
> restartpoint if the checkpoint's timeline is different from the current
> timeline?
>
> [1] https://www.postgresql.org/message-id/1486805889.24568.96.camel%40credativ.de

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message kuroda.hayato@fujitsu.com 2022-02-22 06:41:34 RE: [Proposal] Add foreign-server health checks infrastructure
Previous Message Kyotaro Horiguchi 2022-02-22 06:14:01 Re: Patch a potential memory leak in describeOneTableDetails()