> I'm not especially sold on your theory that there's some behavior that
> forces such convergence, but it's certainly plausible that there was,
> say, a schema alteration applied to all of those partitions at about the
> same time. In any case, as Robert has been saying, it seems like it
> would be smart to try to get autovacuum to spread out the
> anti-wraparound work a bit better when it's faced with a lot of tables
> with similar relfrozenxid values.
Well, I think we can go even further than that. I think one of the
fundamental problems is that our "opportunistic" vacuum XID approach is
essentially broken for any table which doesn't receive continuous
update/deletes (I think Chris Browne makes largely the same point).
They way opportunism currently works is via vacuum_freeze_table_age,
which says "if you were going to vacuum this table *anyway*, and it's
relfrozenxid is # old, then full-scan it". That works fine for tables
getting constant UPDATEs to avoid hitting the wraparound deadline, but
tables which have stopped getting activity, or are insert-only, never
What we should have instead is some opportunism in autovacuum which says:
"If I have otherwise idle workers, and the system isn't too busy*, find
the table with the oldest relfrozenxid which is over
autovacuum_max_freeze_age/2 and vacuum-full-scan it."
The key difficulty there is "if the system isn't too busy". That's a
hard thing to determine, and subject to frequent change. An
opportunistic solution would still be useful without that requirement,
but not as helpful.
I don't find Stephen's proposal of goal-based solutions to be practical.
A goal-based approach makes the assumption that database activity is
predictable, and IME most databases are anything but.
A second obstacle to "opportunistic wraparound vacuum" is that
wraparound vacuum is not interruptable. If you have to kill it off and
do something else for a couple hours, it can't pick up where it left
off; it needs to scan the whole table from the beginning again.
> I continue to maintain that this problem is unrelated to wraparound as
> such, and that thinking it is is a great way to design a bad solution.
> There are any number of reasons why autovacuum might need to run
> max_workers at once. What we need to look at is making sure that they
> don't run the system into the ground when that happens.
> Since your users weren't complaining about performance with one or two
> autovac workers running (were they?),
No, it's when we hit 3 that it fell over. Thresholds vary with memory
and table size, of course.
BTW, the primary reason I think (based on a glance at system stats) this
drove the system to its knees was that the simultaneous wraparound
vacuum of 3 old-cold tables evicted all of the "current" data out of the
FS cache, forcing user queries which would normally hit the FS cache
onto disk. I/O throughput was NOT at 100% capacity.
During busy periods, a single wraparound vacuum wasn't enough to clear
the FS cache because it's competing on equal terms with user access to
data. But three avworkers "ganged up" on the user queries and kicked
the tar out of them.
Unfortunately, for the 5-worker system, I didn't find out about the
issue until after it was over, and I know it was related to wraparound
only because we were logging autovacuum. So I don't know if it had the
There are also problems with our defaults and measurements for the
various vacuum_freeze settings, but changing those won't really fix the
underlying problem, so it's not worth fiddling with them.
The other solution, as mentioned last year, is to come up with a way in
which old-cold data doesn't need to be vacuumed *at all*. This would
be the ideal solution, but it's not clear how to implement it, since any
wraparound-counting solution would bloat the CLOG intolerably.
> we can assume that the cost-delay
> settings were such as to not create a problem in that scenario. So it
> seems to me that it's down to autovac_balance_cost(). Either there's
> a plain-vanilla bug in there, or seek costs are breaking the assumption
> that it's okay to give N workers each 1/Nth of the single-worker I/O
Yeah, I think our I/O balancing approach was too simplistic to deal with
situations like this one. Factors I think break it are:
* modifying cost-limit/cost-delay doesn't translate exactly into 1:1
modifying I/O (in fact, it seems higly unlikely that it does)
* seek costs, as you mention
* FS cache issues and competition with user queries (per above)
> As far as bugs are concerned, I wonder if the premise of the calculation
> * The idea here is that we ration out I/O equally. The amount of I/O
> * that a worker can consume is determined by cost_limit/cost_delay, so we
> * try to equalize those ratios rather than the raw limit settings.
> might be wrong in itself? The ratio idea seems plausible but ...
Well, I think it's "plausible but wrong under at least some common
circumstances". In addition to seeking, it ignores FS cache effects
(not that I have any idea how to account for these mathematically). It
also makes the assumption that 3 autovacuum workers running at 1/3 speed
each is better than having one worker running at full speed, which is
debatable. And it makes the assumption that the main thing autovac
needs to share I/O with is itself ... instead of with user queries.
I'm not saying I have a formula which is better, or that we should junk
that logic and go back to not allocating at all. But we should see if
we can figure out something better. Lemme think about it.
PostgreSQL Experts Inc.
In response to
pgsql-hackers by date
|Next:||From: Stephen Frost||Date: 2012-06-29 02:15:19|
|Subject: Re: We probably need autovacuum_max_wraparound_workers|
|Previous:||From: Tom Lane||Date: 2012-06-29 01:35:51|
|Subject: Re: Notify system doesn't recover from "No space" error|