Re: Should vacuum process config file reload more often

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: Should vacuum process config file reload more often
Date: 2023-04-05 18:55:50
Message-ID: CA+TgmoYDYHht9zsHye3tywx7oWGMP4s_6AGefHkDsMuGHLzT7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 5, 2023 at 11:29 AM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
> Thanks all for the reviews.
>
> v16 attached. I put it together rather quickly, so there might be a few
> spurious whitespaces or similar. There is one rather annoying pgindent
> outlier that I have to figure out what to do about as well.
>
> The remaining functional TODOs that I know of are:
>
> - Resolve what to do about names of GUC and vacuum variables for cost
> limit and cost delay (since it may affect extensions)
>
> - Figure out what to do about the logging message which accesses dboid
> and tableoid (lock/no lock, where to put it, etc)
>
> - I see several places in docs which reference the balancing algorithm
> for autovac workers. I did not read them in great detail, but we may
> want to review them to see if any require updates.
>
> - Consider whether or not the initial two commits should just be
> squashed with the third commit
>
> - Anything else reviewers are still unhappy with

I really like having the first couple of patches split out -- it makes
them super-easy to understand. A committer can always choose to squash
at commit time if they want. I kind of wish the patch set were split
up more, for even easier understanding. I don't think that's a thing
to get hung up on, but it's an opinion that I have.

I strongly agree with the goals of the patch set, as I understand
them. Being able to change the config file and SIGHUP the server and
have the new values affect running autovacuum workers seems pretty
huge. It would make it possible to solve problems that currently can
only be solved by using gdb on a production instance, which is not a
fun thing to be doing.

+ /*
+ * Balance and update limit values for autovacuum workers. We must
+ * always do this in case the autovacuum launcher or another
+ * autovacuum worker has recalculated the number of workers across
+ * which we must balance the limit. This is done by the launcher when
+ * launching a new worker and by workers before vacuuming each table.
+ */

I don't quite understand what's going on here. A big reason that I'm
worried about this whole issue in the first place is that sometimes
there's a vacuum going on a giant table and you can't get it to go
fast. You want it to absorb new settings, and to do so quickly. I
realize that this is about the number of workers, not the actual cost
limit, so that makes what I'm about to say less important. But ... is
this often enough? Like, the time before we move onto the next table
could be super long. The time before a new worker is launched should
be ~autovacuum_naptime/autovacuum_max_workers or ~20s with default
settings, so that's not horrible, but I'm kind of struggling to
understand the rationale for this particular choice. Maybe it's fine.

To be honest, I think that the whole system where we divide the cost
limit across the workers is the wrong idea. Does anyone actually like
that behavior? This patch probably shouldn't touch that, just in the
interest of getting something done that is an improvement over where
we are now, but I think this behavior is really counterintuitive.
People expect that they can increase autovacuum_max_workers to get
more vacuuming done, and actually in most cases that does not work.
And if that behavior didn't exist, this patch would also be a whole
lot simpler. Again, I don't think this is something we should try to
address right now under time pressure, but in the future, I think we
should consider ripping this behavior out.

+ if (autovacuum_vac_cost_limit > 0)
+ VacuumCostLimit = autovacuum_vac_cost_limit;
+ else
+ VacuumCostLimit = vacuum_cost_limit;
+
+ /* Only balance limit if no cost-related storage
parameters specified */
+ if (pg_atomic_unlocked_test_flag(&MyWorkerInfo->wi_dobalance))
+ return;
+ Assert(VacuumCostLimit > 0);
+
+ nworkers_for_balance = pg_atomic_read_u32(
+
&AutoVacuumShmem->av_nworkersForBalance);
+
+ /* There is at least 1 autovac worker (this worker). */
+ if (nworkers_for_balance <= 0)
+ elog(ERROR, "nworkers_for_balance must be > 0");
+
+ VacuumCostLimit = Max(VacuumCostLimit /
nworkers_for_balance, 1);

I think it would be better stylistically to use a temporary variable
here and only assign the final value to VacuumCostLimit.

Daniel: Are you intending to commit this?

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-04-05 19:00:20 Re: monitoring usage count distribution
Previous Message Andres Freund 2023-04-05 18:55:14 Re: failure in 019_replslot_limit