autovacuum scheduling starvation and frenzy

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: autovacuum scheduling starvation and frenzy
Date: 2014-05-15 19:12:15
Message-ID: CAMkU=1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta=YPyFPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

In testing 9.4 with some long running tests, I noticed that autovacuum
launcher/worker sometimes goes a bit nuts. It vacuums the same database
repeatedly without respect to the nap time.

As far as I can tell, the behavior is the same in older versions, but I
haven't tested that.

This is my understanding of what is happening:

If you have a database with a large table in it that has just passed
autovacuum_freeze_max_age, all future workers will be funnelled into that
database until the wrap-around completes. But only one of those workers
can actually vacuum the one table which is holding back the frozenxid.
Maybe the 2nd worker to come along will find other useful work to do, but
eventually all the vacuuming that needs doing is already in progress, and
so each worker starts up, gets directed to this database, finds it can't
help, and exits. So all other databases are entirely starved of
autovacuuming for the entire duration of the wrap-around vacuuming of this
one large table.

Also, the launcher decides when to launch the next worker by looking at the
scheduled time of the least-recently-vacuumed database (with the implicit
intention that that is the one that will get chosen to vacuum next). But
since the worker gets redirected to the wrap-around database instead of the
least-recently-vacuumed database, the least-recently-vacuumed database
never gets it schedule updated and always looks like it is chronologically
overdue. That means the launcher keeps launching new workers as fast as
the previous ones exit, ignoring the nap time. So there is one long running
worker actually making progress, plus a frenzy of workers all attacking the
same database, finding that there is nothing they can do.

I think that a database more than autovacuum_freeze_max_age should get
first priority, but only if its next scheduled vacuum time is in the past.
If it can beneficially use more than one vacuum worker, they would usually
accumulate there naturally within a few naptimes iterations[1]. And if it
can't usefully use more than one worker, don't prevent other databases from
using them.

[1] you could argue that all other max_workers processes could become
pinned down in long running vacuums of other nonrisk databases between the
time that the database crosses autovacuum_freeze_max_age (and has its first
worker started), and the time its nap time expires and so it becomes
eligible for a second one. But that seems like a weak argument, as it
could just have easily happened that all of them got pinned down in nonrisk
databases a few transactions *before* the database crosses
autovacuum_freeze_max_age in the first place.

Does this analysis and proposal seem sound?

Cheers,

Jeff

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-05-15 19:17:41 Re: New timezones used in regression tests
Previous Message Robert Haas 2014-05-15 19:02:06 Re: Race condition between PREPARE TRANSACTION and COMMIT PREPARED (was Re: Problem with txid_snapshot_in/out() functionality)