Re: another autovacuum scheduling thread

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: another autovacuum scheduling thread
Date: 2025-10-09 19:45:32
Message-ID: CAH2-Wz=vmdzeUE7PH9b0igFpJqDKY63icMWmzN=sLiyKVxyqOA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 9, 2025 at 12:15 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Each worker would consult this table before processing. If the table is
> > there, it would remove it from the shared table and skip processing it.
> > Then the next worker would try processing the table again.
> >
> > I also wonder how hard it would be to gracefully catch the error and let
> > the worker continue with the rest of its list...
>
> The main set of cases I've seen are when workers get hung up permanently in
> corrupt indexes.

How recently was this? I'm aware of problems like that that we
discussed around 2018, but they were greatly mitigated.
First by your commit 3a01f68e, then by my commit c34787f9.

In general, there's no particularly good reason why (at least with
nbtree indexes) VACUUM should ever hang forever. The access pattern is
overwhelmingly simple, sequential access. The only exception is nbtree
page deletion (plus backtracking), where it isn't particularly hard to
just be very careful about self-deadlock.

> There never is actually an error, the autovacuums just get
> terminated as part of whatever independent reason there is to restart.

What do you mean?

In general I'd expect nbtree VACUUM of a corrupt index to either not
fail at all (we'll soldier on to the best of our ability when page
deletion encounters an inconsistency), or to get permanently stuck due
to locking the same page twice/self-deadlock (though as I said, those
problems were mitigated, and might even be almost impossible these
days). Every other case involves some kind of error (e.g., an OOM is
just about possible).

I agree with you about using a perfectly deterministic order coming
with real downsides, without any upside. Don't interpret what I've
said as expressing opposition to that idea.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2025-10-09 20:07:06 Re: memory leak in dbase_redo()
Previous Message David Rowley 2025-10-09 19:33:33 Re: [PATCH] Add tests for Bitmapset