Re: when the startup process doesn't (logging startup delays)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Stephen Frost <sfrost(at)snowman(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: when the startup process doesn't (logging startup delays)
Date: 2021-09-23 16:14:03
Message-ID: CA+TgmoZPy4rbozrHRXuJpwnoWqyA=ukrwZRZ5nJF2K8466-eBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 22, 2021 at 10:28 AM Nitin Jadhav
<nitinjadhavpostgres(at)gmail(dot)com> wrote:
> Thanks Justin for the detailed explanation. Done the necessary changes.

Not really. The documentation here does not make a ton of sense:

+ Sets the time interval between each progress update of the operations
+ performed by the startup process. This produces the log messages for
+ those operations which take longer than the specified
duration. The unit
+ used to specify the value is milliseconds. For example, if
you set it to
+ <literal> 10s </literal>, then every <literal> 10s
</literal>, a log is
+ emitted indicating which operation is ongoing, and the
elapsed time from
+ the beginning of the operation. If the value is set to
<literal> 0 </literal>,
+ then all messages for such operations are logged. <literal>
-1 </literal>
+ disables the feature. The default value is <literal> 10s </literal>

I really don't know what to say about this. You say that the time is
measured in milliseconds, and then immediately turn around and say
"For example, if you set it to 10s". Now we do expect that most people
will set it to intervals that are measured in seconds rather than
milliseconds, but saying that setting it to a value measured in
seconds is an example of setting it in milliseconds is not logical. It
also looks pretty silly to say that if you set the value to 10s,
something will happen every 10s. What else would anyone expect to
happen? You really need to give some thought to how to explain the
behavior in a way that is clear and logical but not overly wordy.
Also, please note that you've got spaces around the literals, which
does not match the surrounding style and does not render properly in
HTML.

It is also not logical to define 0 as meaning that "all messages for
such operations are logged". What does that even mean? It makes sense
for something like log_autovacuum_min_duration, because there we are
talking about logging one message per operation, and we could log
messages for all operations or just some of them. Here we are talking
about the time between one message and the next, so talking about "all
messages" does not really seem to make a lot of sense. Experimentally,
what 0 actually does is cause the system to spam log lines in a tight
loop, which cannot be what anyone wants. I think you should make 0
mean disabled, and a positive value mean log at that interval, and
disallow -1 altogether.

And on that note, I tested your patch with
log_startup_progress_interval=-1 and found that -1 behaves just like
0. In other words, contrary to what the documentation says, -1 does
not disable the feature. It instead behaves just like 0. It's really
confusing to me how you write documentation that says -1 has a certain
behavior without thinking about the fact that you haven't written any
code that would make -1 behave that way. And apparently you didn't
test it, either. It took me approximately 1 minute of testing to find
that this is broken, which really is not very much.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2021-09-23 16:16:47 Re: mark the timestamptz variant of date_bin() as stable
Previous Message Fabrice Chapuis 2021-09-23 16:03:56 Re: Logical replication timeout problem