Re: allow changing autovacuum_max_workers without restarting

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: "Imseih (AWS), Sami" <simseih(at)amazon(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: allow changing autovacuum_max_workers without restarting
Date: 2024-04-19 18:42:13
Message-ID: CA+TgmoY03ZU_BYPUDq6JCbdH0w0okB8_-tnjgvk8_6oBbf9mgw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 19, 2024 at 11:43 AM Nathan Bossart
<nathandbossart(at)gmail(dot)com> wrote:
> Removed in v2. I also noticed that I forgot to update the part about when
> autovacuum_max_workers can be changed. *facepalm*

I think this could help a bunch of users, but I'd still like to
complain, not so much with the desire to kill this patch as with the
desire to broaden the conversation.

Part of the underlying problem here is that, AFAIK, neither PostgreSQL
as a piece of software nor we as human beings who operate PostgreSQL
databases have much understanding of how autovacuum_max_workers should
be set. It's relatively easy to hose yourself by raising
autovacuum_max_workers to try to make things go faster, but produce
the exact opposite effect due to how the cost balancing stuff works.
But, even if you have the correct use case for autovacuum_max_workers,
something like a few large tables that take a long time to vacuum plus
a bunch of smaller ones that can't get starved just because the big
tables are in the midst of being processed, you might well ask
yourself why it's your job to figure out the correct number of
workers.

Now, before this patch, there is a fairly good reason for that, which
is that we need to reserve shared memory resources for each autovacuum
worker that might potentially run, and the system can't know how much
shared memory you'd like to reserve for that purpose. But if that were
the only problem, then this patch would probably just be proposing to
crank up the default value of that parameter rather than introducing a
second one. I bet Nathan isn't proposing that because his intuition is
that it will work out badly, and I think he's right. I bet that
cranking up the number of allowed workers will often result in running
more workers than we really should. One possible negative consequence
is that we'll end up with multiple processes fighting over the disk in
a situation where they should just take turns. I suspect there are
also ways that we can be harmed - in broadly similar fashion - by cost
balancing.

So I feel like what this proposal reveals is that we know that our
algorithm for ramping up the number of running workers doesn't really
work. And maybe that's just a consequence of the general problem that
we have no global information about how much vacuuming work there is
to be done at any given time, and therefore we cannot take any kind of
sensible guess about whether 1 more worker will help or hurt. Or,
maybe there's some way to do better than what we do today without a
big rewrite. I'm not sure. I don't think this patch should be burdened
with solving the general problem here. But I do think the general
problem is worth some discussion.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-04-19 18:44:50 Re: fix tablespace handling in pg_combinebackup
Previous Message Tom Lane 2024-04-19 18:08:26 Re: Support a wildcard in backtrace_functions