Re: autovacuum scheduling starvation and frenzy

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum scheduling starvation and frenzy
Date: 2014-05-15 19:55:07
Message-ID: 20140515195506.GA7857@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jeff Janes wrote:

> If you have a database with a large table in it that has just passed
> autovacuum_freeze_max_age, all future workers will be funnelled into that
> database until the wrap-around completes. But only one of those workers
> can actually vacuum the one table which is holding back the frozenxid.
> Maybe the 2nd worker to come along will find other useful work to do, but
> eventually all the vacuuming that needs doing is already in progress, and
> so each worker starts up, gets directed to this database, finds it can't
> help, and exits. So all other databases are entirely starved of
> autovacuuming for the entire duration of the wrap-around vacuuming of this
> one large table.

Bah. Of course :-(

Note that if you have two databases in danger of wraparound, the oldest
will always be chosen until it's no longer in danger. Ignoring the
second one past freeze_max_age seems bad also.

This code is in autovacuum.c, do_start_worker(). Not sure what does
your proposal look like in terms of code. I think that instead of
trying to get a single target database in that foreach loop, we could
try to build a prioritized list (in-wraparound-danger first, then
in-multixid-wraparound danger, then the one with the oldest autovac time
of all the ones that remain); then recheck the wrap-around condition by
seeing whether there are other workers in that database that started
after the wraparound condition appeared. If there are, move down the
list. The first in the list not skipped is chosen for vacuuming.

(Do we need to consider the situation that all databases were skipped by
the above logic, and if so then perhaps pick up the first DB in the
list?)

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2014-05-15 19:57:45 Re: buildfarm animals and 'snapshot too old'
Previous Message Robert Haas 2014-05-15 19:40:06 Re: Proposal for CSN based snapshots