Re: Should vacuum process config file reload more often

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: Should vacuum process config file reload more often
Date: 2023-04-11 15:05:16
Message-ID: CAD21AoD22LdRdvwmiybmOpr+0d1gCEMyOLs9CoU46rycZb-mJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 7, 2023 at 10:23 PM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:
>
> > On 7 Apr 2023, at 15:07, Melanie Plageman <melanieplageman(at)gmail(dot)com> wrote:
> > On Fri, Apr 7, 2023 at 2:53 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> >> + /* Only log updates to cost-related variables */
> >> + if (vacuum_cost_delay == original_cost_delay &&
> >> + vacuum_cost_limit == original_cost_limit)
> >> + return;
> >>
> >> IIUC by default, we log not only before starting the vacuum but also
> >> when changing cost-related variables. Which is good, I think, because
> >> logging the initial values would also be helpful for investigation.
> >> However, I think that we don't log the initial vacuum cost values
> >> depending on the values. For example, if the
> >> autovacuum_vacuum_cost_delay storage option is set to 0, we don't log
> >> the initial values. I think that instead of comparing old and new
> >> values, we can write the log only if
> >> message_level_is_interesting(DEBUG2) is true. That way, we don't need
> >> to acquire the lwlock unnecessarily. And the code looks cleaner to me.
> >> I've attached the patch (use_message_level_is_interesting.patch)
> >
> > Thanks for coming up with the case you thought of with storage param for
> > cost delay = 0. In that case we wouldn't print the message initially and
> > we should fix that.
> >
> > I disagree, however, that we should condition it only on
> > message_level_is_interesting().
>
> I think we should keep the logging frequency as committed, but condition taking
> the lock on message_level_is_interesting().
>
> > Actually, outside of printing initial values when the autovacuum worker
> > first starts (before vacuuming all tables), I don't think we should log
> > these values except when they are being updated. Autovacuum workers
> > could vacuum tons of small tables and having this print out at least
> > once per table (which I know is how it is on master) would be
> > distracting. Also, you could be reloading the config to update some
> > other GUCs and be oblivious to an ongoing autovacuum and get these
> > messages printed out, which I would also find distracting.
> >
> > You will have to stare very hard at the logs to tell if your changes to
> > vacuum cost delay and limit took effect when you reload config. I think
> > with our changes to update the values more often, we should take the
> > opportunity to make this logging more useful by making it happen only
> > when the values are changed.
> >

For debugging purposes, I think it could also be important information
that the cost values are not changed. Personally, I prefer to log the
current state rather than deciding for ourselves which events are
important. If always logging these values in DEBUG2 had been
distracting, we might want to lower it to DEBUG3.

> > I would be open to elevating the log level to DEBUG1 for logging only
> > updates and, perhaps, having an option if you set log level to DEBUG2,
> > for example, to always log these values in VacuumUpdateCosts().

I'm not really sure it's a good idea to change the log messages and
events depending on elevel. Do you know we have any precedents ?

> >
> > I'd even argue that, potentially, having the cost-delay related
> > parameters printed at the beginning of vacuuming could be interesting to
> > regular VACUUM as well (even though it doesn't benefit from config
> > reload while in progress).
> >
> > To fix the issue you mentioned and ensure the logging is printed when
> > autovacuum workers start up before vacuuming tables, we could either
> > initialize vacuum_cost_delay and vacuum_cost_limit to something invalid
> > that will always be different than what they are set to in
> > VacuumUpdateCosts() (not sure if this poses a problem for VACUUM using
> > these values since they are set to the defaults for VACUUM). Or, we
> > could duplicate this logging message in do_autovacuum().
>
> Duplicating logging, maybe with a slightly tailored message, seem the least
> bad option.
>
> > Finally, one other point about message_level_is_interesting(). I liked
> > the idea of using it a lot, since log level DEBUG2 will not be the
> > common case. I thought of it but hesitated because all other users of
> > message_level_is_interesting() are avoiding some memory allocation or
> > string copying -- not avoiding take a lock. Making this conditioned on
> > log level made me a bit uncomfortable. I can't think of a situation when
> > it would be a problem, but it felt a bit off.
>
> Considering how uncommon DEBUG2 will be in production, I think conditioning
> taking a lock on it makes sense.

The comment of message_level_is_interesting() says:

* This is useful to short-circuit any expensive preparatory work that
* might be needed for a logging message.

Which can apply to taking a lwlock, I think.

>
> >> Also, while testing the autovacuum delay with relopt
> >> autovacuum_vacuum_cost_delay = 0, I realized that even if we set
> >> autovacuum_vacuum_cost_delay = 0 to a table, wi_dobalance is set to
> >> true. wi_dobalance comes from the following expression:
> >>
> >> /*
> >> * If any of the cost delay parameters has been set individually for
> >> * this table, disable the balancing algorithm.
> >> */
> >> tab->at_dobalance =
> >> !(avopts && (avopts->vacuum_cost_limit > 0 ||
> >> avopts->vacuum_cost_delay > 0));
> >>
> >> The initial values of both avopts->vacuum_cost_limit and
> >> avopts->vacuum_cost_delay are -1. I think we should use ">= 0" instead
> >> of "> 0". Otherwise, we include the autovacuum worker working on a
> >> table whose autovacuum_vacuum_cost_delay is 0 to the balancing
> >> algorithm. Probably this behavior has existed also on back branches
> >> but I haven't checked it yet.
> >
> > Thank you for catching this. Indeed this exists in master since
> > 1021bd6a89b which was backpatched. I checked and it is true all the way
> > back through REL_11_STABLE.

Thanks for checking!

> >
> > Definitely seems worth fixing as it kind of defeats the purpose of the
> > original commit. I wish I had noticed before!
> >
> > Your fix has:
> > !(avopts && (avopts->vacuum_cost_limit >= 0 ||
> > avopts->vacuum_cost_delay >= 0));
> >
> > And though delay is required to be >= 0
> > avopts->vacuum_cost_delay >= 0
> >
> > Limit does not. It can just be > 0.
> >
> > postgres=# create table foo (a int) with (autovacuum_vacuum_cost_limit = 0);
> > ERROR: value 0 out of bounds for option "autovacuum_vacuum_cost_limit"
> > DETAIL: Valid values are between "1" and "10000".
> >
> > Though >= is also fine, the rest of the code in all versions always
> > checks if limit > 0 and delay >= 0 since 0 is a valid value for delay
> > and not for limit. Probably best we keep it consistent (though the whole
> > thing is quite confusing).
>
> +1

+1. I misunderstood the initial value of autovacuum_vacuum_cost_limit reloption.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-04-11 15:09:05 Re: running logical replication as the subscription owner
Previous Message Justin Pryzby 2023-04-11 14:53:00 Re: Various typo fixes