Re: Should vacuum process config file reload more often

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: Should vacuum process config file reload more often
Date: 2023-04-06 17:55:12
Message-ID: CA+Tgmoa5stGZxq-Ob42iKQ-scs9qw3F03y__OPYvc2Wac64LHw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 5, 2023 at 4:59 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I think that I agree. I think that the difficulty of tuning autovacuum
> is the actual real problem. (Or maybe it's just very closely related
> to the real problem -- the precise definition doesn't seem important.)

I agree, and I think that bad choices around what the parameters do
are a big part of the problem. autovacuum_max_workers is one example
of that, but there are a bunch of others. It's not at all intuitive
that if your database gets really big you either need to raise
autovacuum_vacuum_cost_limit or lower autovacuum_vacuum_cost_delay.
And, it's not intuitive either that raising autovacuum_max_workers
doesn't increase the amount of vacuuming that gets done. In my
experience, it's very common for people to observe that autovacuum is
running constantly, and to figure out that the number of running
workers is equal to autovacuum_max_workers at all times, and to then
conclude that they need more workers. So they raise
autovacuum_max_workers and nothing gets any better. In fact, things
might get *worse*, because the time required to complete vacuuming of
a large table can increase if the available bandwidth is potentially
spread across more workers, and it's very often the time to vacuum the
largest tables that determines whether things hold together adequately
or not.

This kind of stuff drives me absolutely batty. It's impossible to make
every database behavior completely intuitive, but here we have a
parameter that seems like it is exactly the right thing to solve the
problem that the user knows they have, and it actually does nothing on
a good day and causes a regression on a bad one. That's incredibly
poor design.

The way it works at the implementation level is pretty kooky, too. The
available resources are split between the workers, but if any of the
relevant vacuum parameters are set for the table currently being
vacuumed, then that worker gets the full resources configured for that
table, and everyone else divides up the amount that's configured
globally. So if you went and set the cost delay and cost limit for all
of your tables to exactly the same values that are configured
globally, you'd vacuum 3 times faster than if you relied on the
identical global defaults (or N times faster, where N is the value
you've picked for autovacuum_max_workers). If you have one really big
table that requires continuous vacuuming, you could slow down
vacuuming on that table through manual configuration settings and
still end up speeding up vacuuming overall, because the remaining
workers would be dividing the budget implied by the default settings
among N-1 workers instead of N workers. As far as I can see, none of
this is documented, which is perhaps for the best, because IMV it
makes no sense.

I think we need to move more toward a model where VACUUM just keeps
up. Emergency mode is a step in that direction, because the definition
of an emergency is that we're definitely not keeping up, but I think
we need something less Boolean. If the database gets bigger or smaller
or more or less active, autovacuum should somehow just adjust to that,
without so much manual fiddling. I think it's good to have the
possibility of some manual fiddling to handle problematic situations,
but you shouldn't have to do it just because you made a table bigger.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2023-04-06 17:56:17 Re: psql \watch 2nd argument: iteration count
Previous Message Tom Lane 2023-04-06 17:32:35 Re: monitoring usage count distribution