Re: when the startup process doesn't (logging startup delays)

From: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Stephen Frost <sfrost(at)snowman(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: when the startup process doesn't (logging startup delays)
Date: 2021-09-07 09:37:15
Message-ID: CAMm1aWbV6_7d3-J0RLKWMysB4fdjgTN_0zynav9Fi+tKepe2Hw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Looking over this version, I realized something I (or you) should have
> noticed sooner: you've added the RegisterTimeout call to
> InitPostgres(), but that's for things that are used by all backends,
> and this is only used by the startup process. So it seems to me that
> the right place is StartupProcessMain. That would have the further
> advantage of allowing startup_progress_timeout_handler to be made
> static. begin_startup_progress_phase() and
> startup_progress_timeout_has_expired() are the actual API functions
> though so they will need to remain extern.
>
> I agree with Robert that a standby server should not continuously show timing
> messages for WAL replay.

The earlier discussion wrt this point is as follows.

> > I also agree that this is the better place to do it. Hence modified
> > the patch accordingly. The condition "!AmStartupProcess()" is added to
> > differentiate whether the call is done from a startup process or some
> > other process. Actually StartupXLOG() gets called in 2 places. one in
> > StartupProcessMain() and the other in InitPostgres(). As the logging
> > of the startup progress is required only during the startup process
> > and not in the other cases,
>
> The InitPostgres() case occurs when the server is started in bootstrap
> mode (during initdb) or in single-user mode (postgres --single). I do
> not see any reason why we shouldn't produce progress messages in at
> least the latter case. I suspect that someone who is in the rather
> desperate scenario of having to use single-user mode would really like
> to know how long the server is going to take to start up.
>
> Perhaps during initdb we don't want messages, but I'm not sure that we
> need to do anything about that here. None of the messages that the
> server normally produces show up when you run initdb, so I guess they
> are getting redirected to /dev/null or something.
>
> So I don't think that using AmStartupProcess() for this purpose is right.

So as per the recent discussion, RegisterTimeout call should be
removed from InitPostgres() and the condition "!AmStartupProcess()" is
to be added in begin_startup_progress_phase() and
ereport_startup_progress() to differentiate whether the call is from a
startup process or some other process. Kindly correct me if I am
wrong.

> Some doc comments:

Thanks for the suggestions. I will take care in the next patch.

> Clearly. This should be limited to crash recovery, and maybe there
> could be some checks to make sure that nothing is logged once a
> consistent point is reached.

The purpose here is to show the progress of the operation if it is
taking longer than the interval set by the user until it completes the
operation. Users should know what operation is happening in the
background and to show the progress, displaying the elapsed time. So
according to me the consistent point is nothing but the end of the
operation. Kindly let me know if you have something in mind and that
could be the better consistent point.

> Honestly, I don't see why we should have
> a GUC for what's proposed here. A value too low would bloat the logs
> with entries that are not that meaningful. A value too large would
> just prevent access to some information wanted. Wouldn't it be better
> to just pick up a value like 10s or 20s?

It is difficult to finalise the value and use that value without
providing an option to change. If we choose one value (say 10s), it
may be too less for some users or too large for some other users. So I
feel it is better to provide an option to users so that they can
choose the value according to their need. Anyway the default value set
for this setting is 10s.

> + {"log_startup_progress_interval", PGC_SIGHUP, LOGGING_WHEN,
> + gettext_noop("Sets the time interval between each progress update "
> + "of the startup process."),
> + gettext_noop("0 logs all messages. -1 turns this feature off."),
> + GUC_UNIT_MS,
> The unit is incorrect here, as that would default to 10ms, contrary to
> what the documentation says about 10s.

Kindly refer the previous few discussions wrt this point.

On Tue, Sep 7, 2021 at 10:58 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Fri, Sep 03, 2021 at 09:23:27PM -0500, Justin Pryzby wrote:
> > On Fri, Sep 03, 2021 at 01:23:56PM +0530, Nitin Jadhav wrote:
> > > Please find the updated patch attached.
> >
> > Please check CA+TgmoZtbqxaOLdpNkBcDbz=41tWALA8kpH4M=RWtPYHC7-KNg(at)mail(dot)gmail(dot)com
> >
> > I agree with Robert that a standby server should not continuously show timing
> > messages for WAL replay.
>
> Clearly. This should be limited to crash recovery, and maybe there
> could be some checks to make sure that nothing is logged once a
> consistent point is reached. Honestly, I don't see why we should have
> a GUC for what's proposed here. A value too low would bloat the logs
> with entries that are not that meaningful. A value too large would
> just prevent access to some information wanted. Wouldn't it be better
> to just pick up a value like 10s or 20s?
>
> Looking at v13..
>
> + {"log_startup_progress_interval", PGC_SIGHUP, LOGGING_WHEN,
> + gettext_noop("Sets the time interval between each progress update "
> + "of the startup process."),
> + gettext_noop("0 logs all messages. -1 turns this feature off."),
> + GUC_UNIT_MS,
> The unit is incorrect here, as that would default to 10ms, contrary to
> what the documentation says about 10s.
> --
> Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2021-09-07 10:12:36 Re: Improve logging when using Huge Pages
Previous Message Fabien COELHO 2021-09-07 09:24:39 Re: Avoid stuck of pbgench due to skipped transactions