Re: autovacuum strategy / parameters

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Rick <richard(dot)branton(at)ca(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: autovacuum strategy / parameters
Date: 2010-04-30 22:50:54
Message-ID: 4BDB5ECE.4010607@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


> My guess is that the reason we run ANALYZE more frequently than vacuum
> (with the default settings) is that ANALYZE is pretty cheap. In many
> cases, if the statistical distribution of the data hasn't changed
> much, then it's not really necessary, but it doesn't cost much either.
> And for certain types of usage patterns, like time series (where the
> maximum value keeps increasing) it's REALLY important to analyze
> frequently.
>
> But having said that, on the systems I've worked with, I've only
> rarely seen a problem caused by not analyzing frequently enough. On
> the other hand, I've seen MANY problems caused by not vacuuming
> enough.

Which is the opposite of my experience; currently we have several
clients who have issues which required more-frequent analyzes on
specific tables. Before 8.4, vacuuming more frequently, especially on
large tables, was very costly; vacuum takes a lot of I/O and CPU. Even
with 8.4 it's not something you want to increase without thinking about
the tradeoffs.

Since I'm responsible for the current defaults, I though I'd explain the
reasoning behind them. I developed and tested them while at Greenplum,
so they are *not* designed for small databases.

#autovacuum_vacuum_threshold = 50
#autovacuum_analyze_threshold = 50

These two are set to the minimum threshold to avoid having small tables
get vacuum/analyzed continuously, but to make sure that small tables do
get vacuumed & analyzed sometimes.

#autovacuum_vacuum_scale_factor = 0.2

This is set because in my experience, 20% bloat is about the level at
which bloat starts affecting performance; thus, we want to vacuum at
that level but not sooner. This does mean that very large tables which
never have more than 10% updates/deletes don't get vacuumed at all until
freeze_age; this is a *good thing*. VACUUM on large tables is expensive;
you don't *want* to vacuum a billion-row table which has only 100,000
updates.

#autovacuum_analyze_scale_factor = 0.1

The 10% threshold for analyze is there because (a) analyze is cheap, and
(b) 10% changes to a table can result in very bad plans if the changes
are highly skewed towards a specific range, such as additions onto the
end of a time-based table.

The current postgres defaults were tested on DBT2 as well as pgbench,
and in my last 2 years of consulting I've seldom found reason to touch
them except on *specific* tables. So I still feel that they are good
defaults.

It would be worth doing a DBT2/DBT5 test run with different autovac
settings post-8.4 so see if we should specifically change the vacuum
threshold. Pending that, though, I think the current defaults are good
enough.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Alvaro Herrera 2010-05-01 03:08:20 Re: autovacuum strategy / parameters
Previous Message Cédric Villemain 2010-04-29 08:21:13 Re: Optimization idea