Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
Cc: Julien Rouhaud <rjuju123(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Date: 2022-02-22 18:42:54
Message-ID: CAEze2Wh3dmGhsnkvv9D3krU9m+o86SEXnBdRQB=krjFgX2ANRg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 22 Feb 2022 at 07:39, Nitin Jadhav
<nitinjadhavpostgres(at)gmail(dot)com> wrote:
>
> > > Thank you for sharing the information. 'triggering backend PID' (int)
> > > - can be stored without any problem. 'checkpoint or restartpoint?'
> > > (boolean) - can be stored as a integer value like
> > > PROGRESS_CHECKPOINT_TYPE_CHECKPOINT(0) and
> > > PROGRESS_CHECKPOINT_TYPE_RESTARTPOINT(1). 'elapsed time' (store as
> > > start time in stat_progress, timestamp fits in 64 bits) - As
> > > Timestamptz is of type int64 internally, so we can store the timestamp
> > > value in the progres parameter and then expose a function like
> > > 'pg_stat_get_progress_checkpoint_elapsed' which takes int64 (not
> > > Timestamptz) as argument and then returns string representing the
> > > elapsed time.
> >
> > No need to use a string there; I think exposing the checkpoint start
> > time is good enough. The conversion of int64 to timestamp[tz] can be
> > done in SQL (although I'm not sure that exposing the internal bitwise
> > representation of Interval should be exposed to that extent) [0].
> > Users can then extract the duration interval using now() - start_time,
> > which also allows the user to use their own preferred formatting.
>
> The reason for showing the elapsed time rather than exposing the
> timestamp directly is in case of checkpoint during shutdown and
> end-of-recovery, I am planning to log a message in server logs using
> 'log_startup_progress_interval' infrastructure which displays elapsed
> time. So just to match both of the behaviour I am displaying elapsed
> time here. I feel that elapsed time gives a quicker feel of the
> progress. Kindly let me know if you still feel just exposing the
> timestamp is better than showing the elapsed time.

At least for pg_stat_progress_checkpoint, storing only a timestamp in
the pg_stat storage (instead of repeatedly updating the field as a
duration) seems to provide much more precise measures of 'time
elapsed' for other sessions if one step of the checkpoint is taking a
long time.

I understand the want to integrate the log-based reporting in the same
API, but I don't think that is necessarily the right approach:
pg_stat_progress_* has low-overhead infrastructure specifically to
ensure that most tasks will not run much slower while reporting, never
waiting for locks. Logging, however, needs to take locks (if only to
prevent concurrent writes to the output file at a kernel level) and
thus has a not insignificant overhead and thus is not very useful for
precise and very frequent statistics updates.

So, although similar in nature, I don't think it is smart to use the
exact same infrastructure between pgstat_progress*-based reporting and
log-based progress reporting, especially if your logging-based
progress reporting is not intended to be a debugging-only
configuration option similar to log_min_messages=DEBUG[1..5].

- Matthias

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Imseih (AWS), Sami 2022-02-22 19:20:55 [BUG] Panic due to incorrect missingContrecPtr after promotion
Previous Message Andres Freund 2022-02-22 18:19:24 Re: bailing out in tap tests nearly always a bad idea