Re: another autovacuum scheduling thread

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Sami Imseih <samimseih(at)gmail(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Robert Treat <rob(at)xzilla(dot)net>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: another autovacuum scheduling thread
Date: 2025-11-22 18:35:27
Message-ID: CA+TgmoZ44jzFBB4zxoXrX6iMxv7TPzv5duPe44UPpoyBBDL48g@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Nov 22, 2025 at 12:28 PM Sami Imseih <samimseih(at)gmail(dot)com> wrote:
> What I have not been able to prove from my tests is that the processing
> order of tables by autovacuum will actually make things any better or any
> worse. My tests have been short 30 minute tests that count how many
> vacuum cycles tables with various DML activity and sizes received.
> I have not found much difference. I am also not sure how valuable
> these short-duration tests are either.

Yeah, I'm not sure that would be the right way to look for a benefit
from something like this. I think that a better test scenario might
involve figuring out how fast we can recover from a bad situation. As
we've discussed before, if VACUUM is chronically unable to keep up
with the workload, then the system is going to get into a very bad
state and there's not really any help for it. But if we start to get
into a bad situation due to some outside interference and then someone
removes the interference, we might hope that this patch would help us
get back on our feet more quickly.

For instance, suppose that we have a database with a stale replication
slot, so the oldest-XID value for the cluster keeps getting older and
older. autovacuum is probably running but it can't clean anything up.
Then at some point, the DBA realizes that bad things are happening and
drops the replication slot. You might hope that, with the patch,
autovacuum would do a better job getting the system back to a working
state. If you set up some kind of test scenario, you could ask
questions like "what is the largest age(relfrozenxid) that we observe
in the database at any point during the test?" or "from the time the
replication slot is dropped, how much time passes before
age(datfrozenxid) drops to normal?" or "what is the maximum observed
amount of bloat during the test?".

The same kind of idea could apply to anything else that stops vacuum
from running or makes it unproductive: a full table lock on a key
table, an open transaction, a table where VACUUM is failing. I
actually don't know exactly what kind of scenario would be good to
test here, because I struggle to think of a concrete scenario in which
we'd be better off with this than without it (which might be a reason
not to proceed with it, despite the fact that I think we all agree
that, from a theoretical point of view, the idea of prioritizing
sounds better than the idea of not prioritizing). But I think that if
the patch has a benefit, it won't be one where the system is in a
steady state where vacuum is able to keep up. It might be one where
we're in a steady state where vacuum is not able to keep up and things
are getting worse and worse, but the patch allows us to survive for
longer before terrible things happen. But I would say that the most
promising scenario for this patch would be something like what I
describe above, where we're not in a steady state at all: something
bad has happened and now we're trying to recover.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-11-22 18:37:21 Re: Add notification on BEGIN ATOMIC SQL functions using temp relations
Previous Message Bernice Southey 2025-11-22 18:20:27 Re: Add notification on BEGIN ATOMIC SQL functions using temp relations