Re: when the startup process doesn't (logging startup delays)

From: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Stephen Frost <sfrost(at)snowman(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: when the startup process doesn't (logging startup delays)
Date: 2021-09-27 11:27:20
Message-ID: CAMm1aWbZvHuodydSeV5=K8iuF_Qf+OaY9zFBzNAb3-B5umjWng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> I really don't know what to say about this. You say that the time is
> measured in milliseconds, and then immediately turn around and say
> "For example, if you set it to 10s". Now we do expect that most people
> will set it to intervals that are measured in seconds rather than
> milliseconds, but saying that setting it to a value measured in
> seconds is an example of setting it in milliseconds is not logical.

Based on the statement "I suggest making the GUC GUC_UNIT_MS rather
than GUC_UNIT_S, but expressing the default in postgresl.conf.sample
as 10s rather than 10000ms", I have used the default value in the
postgresl.conf.sample as 10s rather than 10000ms. So I just used the
same value in the example too in config.sgml. If it is really getting
confusing, I will change it to 100ms in config.sgml.

> It also looks pretty silly to say that if you set the value to 10s,
> something will happen every 10s. What else would anyone expect to
> happen? You really need to give some thought to how to explain the
> behavior in a way that is clear and logical but not overly wordy.

Added a few lines about that. "For example, if you set it to 1000ms,
then it tries to emit a log every 1000ms. If the log message is not
available at every 100th millisecond, then there is a possibility of
delay in emitting the log. If the delay is more than a cycle or if the
system clock gets set backwards then the next attempt is done based on
the last logging time, otherwise the delay gets adjusted in the next
attempt."

Please correct the explanation if it does not meet your expectations.

> Also, please note that you've got spaces around the literals, which
> does not match the surrounding style and does not render properly in
> HTML.

Fixed.

> It is also not logical to define 0 as meaning that "all messages for
> such operations are logged". What does that even mean? It makes sense
> for something like log_autovacuum_min_duration, because there we are
> talking about logging one message per operation, and we could log
> messages for all operations or just some of them. Here we are talking
> about the time between one message and the next, so talking about "all
> messages" does not really seem to make a lot of sense. Experimentally,
> what 0 actually does is cause the system to spam log lines in a tight
> loop, which cannot be what anyone wants. I think you should make 0
> mean disabled, and a positive value mean log at that interval, and
> disallow -1 altogether.

Made changes which indicate 0 mean disabled, > 0 mean interval in
millisecond and removed -1.

Please find the patch attached.

On Thu, Sep 23, 2021 at 9:44 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Wed, Sep 22, 2021 at 10:28 AM Nitin Jadhav
> <nitinjadhavpostgres(at)gmail(dot)com> wrote:
> > Thanks Justin for the detailed explanation. Done the necessary changes.
>
> Not really. The documentation here does not make a ton of sense:
>
> + Sets the time interval between each progress update of the operations
> + performed by the startup process. This produces the log messages for
> + those operations which take longer than the specified
> duration. The unit
> + used to specify the value is milliseconds. For example, if
> you set it to
> + <literal> 10s </literal>, then every <literal> 10s
> </literal>, a log is
> + emitted indicating which operation is ongoing, and the
> elapsed time from
> + the beginning of the operation. If the value is set to
> <literal> 0 </literal>,
> + then all messages for such operations are logged. <literal>
> -1 </literal>
> + disables the feature. The default value is <literal> 10s </literal>
>
> I really don't know what to say about this. You say that the time is
> measured in milliseconds, and then immediately turn around and say
> "For example, if you set it to 10s". Now we do expect that most people
> will set it to intervals that are measured in seconds rather than
> milliseconds, but saying that setting it to a value measured in
> seconds is an example of setting it in milliseconds is not logical. It
> also looks pretty silly to say that if you set the value to 10s,
> something will happen every 10s. What else would anyone expect to
> happen? You really need to give some thought to how to explain the
> behavior in a way that is clear and logical but not overly wordy.
> Also, please note that you've got spaces around the literals, which
> does not match the surrounding style and does not render properly in
> HTML.
>
> It is also not logical to define 0 as meaning that "all messages for
> such operations are logged". What does that even mean? It makes sense
> for something like log_autovacuum_min_duration, because there we are
> talking about logging one message per operation, and we could log
> messages for all operations or just some of them. Here we are talking
> about the time between one message and the next, so talking about "all
> messages" does not really seem to make a lot of sense. Experimentally,
> what 0 actually does is cause the system to spam log lines in a tight
> loop, which cannot be what anyone wants. I think you should make 0
> mean disabled, and a positive value mean log at that interval, and
> disallow -1 altogether.
>
> And on that note, I tested your patch with
> log_startup_progress_interval=-1 and found that -1 behaves just like
> 0. In other words, contrary to what the documentation says, -1 does
> not disable the feature. It instead behaves just like 0. It's really
> confusing to me how you write documentation that says -1 has a certain
> behavior without thinking about the fact that you haven't written any
> code that would make -1 behave that way. And apparently you didn't
> test it, either. It took me approximately 1 minute of testing to find
> that this is broken, which really is not very much.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v17-0001-startup-process-progress.patch application/octet-stream 15.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2021-09-27 11:43:55 Re: Column Filtering in Logical Replication
Previous Message Greg Nancarrow 2021-09-27 11:21:02 Re: Added schema level support for publication.