Re: Should vacuum process config file reload more often

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: Should vacuum process config file reload more often
Date: 2023-04-07 13:07:46
Message-ID: CAAKRu_bETD+Ari600h6fRjX2p8rJSeMAUp=_y88juqOZgouTSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 7, 2023 at 2:53 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Fri, Apr 7, 2023 at 8:08 AM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:
> >
> > > On 7 Apr 2023, at 00:12, Melanie Plageman <melanieplageman(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Apr 6, 2023 at 5:45 PM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:
> > >>
> > >>> On 6 Apr 2023, at 23:06, Melanie Plageman <melanieplageman(at)gmail(dot)com> wrote:
> > >>
> > >>> Autovacuum workers, at the end of VacuumUpdateCosts(), check if cost
> > >>> limit or cost delay have been changed. If they have, they assert that
> > >>> they don't already hold the AutovacuumLock, take it in shared mode, and
> > >>> do the logging.
> > >>
> > >> Another idea would be to copy the values to local temp variables while holding
> > >> the lock, and release the lock before calling elog() to avoid holding the lock
> > >> over potential IO.
> > >
> > > Good idea. I've done this in attached v19.
> > > Also I looked through the docs and everything still looks correct for
> > > balancing algo.
> >
> > I had another read-through and test-through of this version, and have applied
> > it with some minor changes to comments and whitespace. Thanks for the quick
> > turnaround times on reviews in this thread!
>
> Cool!
>
> Regarding the commit 7d71d3dd08, I have one comment:
>
> + /* Only log updates to cost-related variables */
> + if (vacuum_cost_delay == original_cost_delay &&
> + vacuum_cost_limit == original_cost_limit)
> + return;
>
> IIUC by default, we log not only before starting the vacuum but also
> when changing cost-related variables. Which is good, I think, because
> logging the initial values would also be helpful for investigation.
> However, I think that we don't log the initial vacuum cost values
> depending on the values. For example, if the
> autovacuum_vacuum_cost_delay storage option is set to 0, we don't log
> the initial values. I think that instead of comparing old and new
> values, we can write the log only if
> message_level_is_interesting(DEBUG2) is true. That way, we don't need
> to acquire the lwlock unnecessarily. And the code looks cleaner to me.
> I've attached the patch (use_message_level_is_interesting.patch)

Thanks for coming up with the case you thought of with storage param for
cost delay = 0. In that case we wouldn't print the message initially and
we should fix that.

I disagree, however, that we should condition it only on
message_level_is_interesting().

Actually, outside of printing initial values when the autovacuum worker
first starts (before vacuuming all tables), I don't think we should log
these values except when they are being updated. Autovacuum workers
could vacuum tons of small tables and having this print out at least
once per table (which I know is how it is on master) would be
distracting. Also, you could be reloading the config to update some
other GUCs and be oblivious to an ongoing autovacuum and get these
messages printed out, which I would also find distracting.

You will have to stare very hard at the logs to tell if your changes to
vacuum cost delay and limit took effect when you reload config. I think
with our changes to update the values more often, we should take the
opportunity to make this logging more useful by making it happen only
when the values are changed.

I would be open to elevating the log level to DEBUG1 for logging only
updates and, perhaps, having an option if you set log level to DEBUG2,
for example, to always log these values in VacuumUpdateCosts().

I'd even argue that, potentially, having the cost-delay related
parameters printed at the beginning of vacuuming could be interesting to
regular VACUUM as well (even though it doesn't benefit from config
reload while in progress).

To fix the issue you mentioned and ensure the logging is printed when
autovacuum workers start up before vacuuming tables, we could either
initialize vacuum_cost_delay and vacuum_cost_limit to something invalid
that will always be different than what they are set to in
VacuumUpdateCosts() (not sure if this poses a problem for VACUUM using
these values since they are set to the defaults for VACUUM). Or, we
could duplicate this logging message in do_autovacuum().

Finally, one other point about message_level_is_interesting(). I liked
the idea of using it a lot, since log level DEBUG2 will not be the
common case. I thought of it but hesitated because all other users of
message_level_is_interesting() are avoiding some memory allocation or
string copying -- not avoiding take a lock. Making this conditioned on
log level made me a bit uncomfortable. I can't think of a situation when
it would be a problem, but it felt a bit off.

> Also, while testing the autovacuum delay with relopt
> autovacuum_vacuum_cost_delay = 0, I realized that even if we set
> autovacuum_vacuum_cost_delay = 0 to a table, wi_dobalance is set to
> true. wi_dobalance comes from the following expression:
>
> /*
> * If any of the cost delay parameters has been set individually for
> * this table, disable the balancing algorithm.
> */
> tab->at_dobalance =
> !(avopts && (avopts->vacuum_cost_limit > 0 ||
> avopts->vacuum_cost_delay > 0));
>
> The initial values of both avopts->vacuum_cost_limit and
> avopts->vacuum_cost_delay are -1. I think we should use ">= 0" instead
> of "> 0". Otherwise, we include the autovacuum worker working on a
> table whose autovacuum_vacuum_cost_delay is 0 to the balancing
> algorithm. Probably this behavior has existed also on back branches
> but I haven't checked it yet.

Thank you for catching this. Indeed this exists in master since
1021bd6a89b which was backpatched. I checked and it is true all the way
back through REL_11_STABLE.

Definitely seems worth fixing as it kind of defeats the purpose of the
original commit. I wish I had noticed before!

Your fix has:
!(avopts && (avopts->vacuum_cost_limit >= 0 ||
avopts->vacuum_cost_delay >= 0));

And though delay is required to be >= 0
avopts->vacuum_cost_delay >= 0

Limit does not. It can just be > 0.

postgres=# create table foo (a int) with (autovacuum_vacuum_cost_limit = 0);
ERROR: value 0 out of bounds for option "autovacuum_vacuum_cost_limit"
DETAIL: Valid values are between "1" and "10000".

Though >= is also fine, the rest of the code in all versions always
checks if limit > 0 and delay >= 0 since 0 is a valid value for delay
and not for limit. Probably best we keep it consistent (though the whole
thing is quite confusing).

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2023-04-07 13:23:13 Re: Should vacuum process config file reload more often
Previous Message Hayato Kuroda (Fujitsu) 2023-04-07 12:51:51 RE: [PoC] pg_upgrade: allow to upgrade publisher node