PostgreSQL databases require periodic
maintenance known as vacuuming, and require
periodic updates to the statistics used by the
PostgreSQL query planner. These
maintenance tasks are performed by the VACUUM
and ANALYZE
commands
respectively. For most installations, it is sufficient to let the
autovacuum daemon determine when to perform
these maintenance tasks (which is partly determined by configurable
table-level thresholds; see Section 25.1).
The autovacuum daemon has to process each table on a regular basis for several reasons:
VACUUM
operation avoid needlessly scanning pages that are already
frozen
Maintenance work within the scope of items 1, 2, 3, and 4 is
performed by the VACUUM
command internally.
Item 5 (maintenance of planner statistics) is handled by the
ANALYZE
command internally. Although this
section presents information about autovacuum, there is no
difference between manually-issued VACUUM
and
ANALYZE
commands and those run by the autovacuum
daemon (though there are autovacuum-specific variants of a small
number of settings that control VACUUM
).
Autovacuum creates a substantial amount of I/O traffic, which can cause poor performance for other active sessions. There are configuration parameters that can be adjusted to reduce the performance impact of background vacuuming. See the autovacuum-specific cost delay settings described in Section 20.10, and additional cost delay settings described in Section 20.4.4.
Some database administrators will want to supplement the daemon's
activities with manually-managed VACUUM
commands, which typically are executed according to a schedule by
cron or Task
Scheduler scripts. It can be useful to perform
off-hours VACUUM
commands during periods where
reduced load is expected.
In PostgreSQL, an
UPDATE
or DELETE
of a row does not
immediately remove the old version of the row.
This approach is necessary to gain the benefits of multiversion
concurrency control (MVCC, see Chapter 13): the row version
must not be deleted while it is still potentially visible to other
transactions. A deleted row version (whether from an
UPDATE
or DELETE
) will
usually cease to be of interest to any still running transaction
shortly after the original deleting transaction commits.
The space dead tuples occupy must eventually be reclaimed for
reuse by new rows, to avoid unbounded growth of disk space
requirements. Reclaiming space from dead rows is
VACUUM
's main responsibility.
The XID cutoff point that VACUUM
uses to
determine whether or not a deleted tuple is safe to physically
remove is reported under removable cutoff
in
the server log when autovacuum logging (controlled by log_autovacuum_min_duration) reports on a
VACUUM
operation executed by autovacuum.
Tuples that are not yet safe to remove are counted as
dead but not yet removable
tuples in the log
report. VACUUM
establishes its
removable cutoff
once, at the start of the
operation. Any older snapshot (or transaction that allocates an
XID) that's still running when the cutoff is established may hold
it back.
It's important that no long running transactions ever be allowed
to hold back every VACUUM
operation's cutoff
for an extended period. You may wish to monitor this.
Tuples inserted by aborted transactions can be removed by
VACUUM
immediately
VACUUM
will not return space to the operating
system, except in the special case where a group of contiguous
pages at the end of a table become entirely free and an exclusive
table lock can be easily obtained. This relation truncation
behavior can be disabled in tables where the exclusive lock is
disruptive by setting the table's vacuum_truncate
storage parameter to off
.
If you have a table whose entire contents are deleted on a
periodic basis, consider doing it with TRUNCATE
rather
than relying on VACUUM
.
TRUNCATE
removes the entire contents of the
table immediately, avoiding the need to set
xmax
to the deleting transaction's XID.
One disadvantage is that strict MVCC semantics are violated.
VACUUM FULL
or CLUSTER
can
be useful when dealing with extreme amounts of dead tuples. It
can reclaim more disk space, but runs much more slowly. It
rewrites an entire new copy of the table and rebuilds all indexes.
This typically has much higher overhead than
VACUUM
. Generally, therefore, administrators
should avoid using VACUUM FULL
except in the
most extreme cases.
Although VACUUM FULL
is technically an option
of the VACUUM
command, VACUUM
FULL
uses a completely different implementation.
VACUUM FULL
is essentially a variant of
CLUSTER
. (The name VACUUM
FULL
is historical; the original implementation was
somewhat closer to standard VACUUM
.)
TRUNCATE
, VACUUM FULL
, and
CLUSTER
all require an ACCESS
EXCLUSIVE
lock, which can be highly disruptive
(SELECT
, INSERT
,
UPDATE
, and DELETE
commands
will all be blocked).
VACUUM FULL
(and CLUSTER
)
temporarily uses extra disk space approximately equal to the size
of the table, since the old copies of the table and indexes can't
be released until the new ones are complete.
PostgreSQL's MVCC transaction semantics depend on being able to compare transaction ID numbers (XID) to determine whether or not the row is visible to each query's MVCC snapshot (see interpreting XID stamps from tuple headers). But since on-disk storage of transaction IDs in heap pages uses a truncated 32-bit representation to save space (rather than the full 64-bit representation), it is necessary to vacuum every table in every database at least once every two billion transactions (though far more frequent vacuuming is typical).
VACUUM
marks pages as
frozen, indicating that all eligible rows on
the page were inserted by a transaction that committed
sufficiently far in the past that the effects of the inserting
transaction are certain to be visible to all current and future
transactions. Frozen rows are treated as if the inserting
transaction (the heap tuple's xmin
XID)
committed “infinitely far in the past”, making the
effects of the transaction visible to all current and future MVCC
snapshots, forever. Once frozen, pages are “self
contained” in the sense that all of its rows can be read
without ever having to consult externally stored transaction status
metadata (for example, transaction commit status information from
pg_xact
is not required).
Without freezing, the system will eventually refuse to allocate new transaction
IDs, making DML statements throw errors
until such time as VACUUM
can restore the
system's ability to allocate new transaction IDs. The system is
unable to recognize distances of more than about 2.1 billion
transactions between any two unfrozen XIDs. The only safe option
that remains is to disallow allocating new XIDs until
VACUUM
has performed the steps required to make
sure that the “distance” invariant is never violated.
vacuum_freeze_min_age can be used to control
when freezing takes place. When VACUUM
scans a
heap page containing even one XID that already attained an age
exceeding this value, the page is frozen. All tuples from the page
are frozen, making the page suitable for long-term storage (their
is no longer any possible need to interpret the contents of the
page using external transaction status information). Increasing
this setting may avoid unnecessary work if the pages that would
otherwise be frozen will soon be modified again, but decreasing
this setting makes it more likely that some future
VACUUM
operation will need to perform an
excessive amount of “catch-up freezing”, all in one go.
Multixact IDs are used to support row
locking by multiple transactions. Since there is only limited
space in a tuple header to store lock information, that
information is encoded as a “multiple transaction
ID”, or multixact ID for short, whenever there is more
than one transaction concurrently locking a row. Information
about which transaction IDs are included in any particular
multixact ID is stored separately, and only the multixact ID
appears in the xmax
field in the tuple
header. Like transaction IDs, multixact IDs are implemented as a
32-bit counter and corresponding storage.
“Freezing” of multixact IDs is also required, though
this actually means setting xmax
to
InvalidTransactionId
.
(Actually, any xmax
field is processed in the same way when its page is frozen,
including xmax
fields that contain
simple XIDs.)
vacuum_multixact_freeze_min_age can also
influence which pages VACUUM
freezes. Like
vacuum_freeze_min_age
, the setting triggers
page-level freezing. However,
vacuum_multixact_freeze_min_age
uses
MultiXactIds (not XIDs) to measure “age”.
In PostgreSQL versions before 16,
freezing was triggered at the level of individual
xmin
and
xmax
fields. The decision to freeze
(or not freeze) is now made at the level of whole heap pages, at
the point that they are scanned by VACUUM
.
In general, the major cost of freezing is the additional
WAL volume. That's why
VACUUM
doesn't just freeze every eligible tuple
at the earliest opportunity: freezing will go to waste in cases
where a recently frozen tuple soon goes on to be deleted anyway.
Managing the added WAL volume from freezing
over time is an important consideration for
VACUUM
.
VACUUM
also triggers freezing of pages in cases
where it already proved necessary to write out an
FPI to the WAL as torn page
protection (as part of removing dead tuples). The extra
WAL volume from proactive freezing is
insignificant compared to the cost of the FPI.
It is very likely (though not quite certain) that the overall
volume of WAL will be lower in the long term
with tables that have most freezing triggered by the
FPI mechanism, since (at least on average)
future VACUUM
s shouldn't have to write a second
FPI out much later on, when freezing becomes
strictly necessary. The FPI freezing mechanism
is just an alternative trigger criteria for freezing all eligible
tuples on the page. In general, VACUUM
freezes
pages without regard to the condition that triggered freezing.
The physical modifications to the page (how tuples are processed)
is decoupled from the mechanism that decides if those
modifications should happen at all.
VACUUM
may not be able to freeze every tuple's
xmin
in relatively rare cases. The
criteria that determines eligibility for freezing is exactly the
same as the one that determines if a deleted tuple should be
considered removable
or merely dead
but not yet removable
(namely, the XID-based
removable cutoff
).
VACUUM
's aggressive strategy #
To track the age of the oldest unfrozen XIDs in a database,
VACUUM
stores XID statistics in the system
tables pg_class
and
pg_database
. In particular, the
relfrozenxid
column of a table's
pg_class
row contains the oldest
remaining unfrozen XID at the end of the most recent
VACUUM
that successfully advanced
relfrozenxid
. Similarly, the
datfrozenxid
column of a database's
pg_database
row is a lower bound on the
unfrozen XIDs appearing in that database — it is just the
minimum of the per-table relfrozenxid
values within the database.
VACUUM
uses the visibility map to determine which
pages of a table must be scanned. Normally, it will skip pages
that don't have any dead row versions even if those pages might
still have row versions with old XID values. Therefore, normal
VACUUM
s usually won't freeze
every page with an old row version in the
table. Most individual tables will eventually need an
aggressive vacuum, which will reliably
freeze all pages with XID and MXID values older than
vacuum_freeze_min_age
, including those from
all-visible but not all-frozen pages. vacuum_freeze_table_age controls when
VACUUM
must use its aggressive strategy.
Since the setting is applied against
age(relfrozenxid)
, settings like
vacuum_freeze_min_age
may influence the exact
cadence of aggressive vacuuming. Setting
vacuum_freeze_table_age
to 0 forces
VACUUM
to always use its aggressive strategy.
In practice most tables require periodic aggressive vacuuming.
However, some individual non-aggressive
VACUUM
operations may be able to advance
relfrozenxid
and/or
relminmxid
. This is most common in
small, frequently modified tables, where
VACUUM
happens to scan all pages (or at least
all pages not marked all-frozen) in the course of removing dead
tuples.
The maximum time that a table can go unvacuumed is about two billion transactions. If it were to go unvacuumed for longer than that, the system will refuse to allocate new transaction IDs, temporarily rendering the database read-only. To ensure that this never happens, autovacuum is invoked on any table that might contain unfrozen rows with XIDs older than the age specified by the configuration parameter autovacuum_freeze_max_age. (This will happen even if autovacuum is disabled.)
In practice all anti-wraparound autovacuums will use
VACUUM
's aggressive strategy. This is assured
because the effective value of
vacuum_freeze_table_age
is
“clamped” to a value no greater than 95% of the
current value of autovacuum_freeze_max_age
.
As a rule of thumb, vacuum_freeze_table_age
should be set to a value somewhat below
autovacuum_freeze_max_age
, leaving enough gap
so that a regularly scheduled VACUUM
or an
autovacuum triggered by inserts, updates and deletes is run in
that window. Anti-wraparound autovacuums can be avoided
altogether in tables that reliably receive
some VACUUM
s that use the
aggressive strategy.
There is no fundamental difference between a
VACUUM
run during anti-wraparound
autovacuuming and a VACUUM
that happens to
use the aggressive strategy (whether run by autovacuum or
manually issued).
A convenient way to examine information about
relfrozenxid
and
relminmxid
is to execute queries such as:
SELECT c.oid::regclass as table_name, greatest(age(c.relfrozenxid), age(t.relfrozenxid)) as xid_age, mxid_age(c.relminmxid) FROM pg_class c LEFT JOIN pg_class t ON c.reltoastrelid = t.oid WHERE c.relkind IN ('r', 'm'); SELECT datname, age(datfrozenxid) as xid_age, mxid_age(datminmxid) FROM pg_database;
The age
column measures the number of
transactions from the cutoff XID to the next unallocated
transactions XID. The mxid_age
column
measures the number of MultiXactIds from the cutoff MultiXactId
to the next unallocated MultiXactId.
When the VACUUM
command's
VERBOSE
parameter is specified,
VACUUM
prints various statistics about the
table. This includes information about how
relfrozenxid
and
relminmxid
advanced, and the number
of newly frozen pages. The same details appear in the server
log when autovacuum logging (controlled by log_autovacuum_min_duration) reports on a
VACUUM
operation executed by autovacuum.
xidStopLimit
mode #
If for some reason autovacuum utterly fails to advance any
table's relfrozenxid
or
relminmxid
, the system will begin to
emit warning messages like this when the database's oldest XIDs
reach forty million transactions from the wraparound point:
WARNING: database "mydb" must be vacuumed within 39985967 transactions HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
(A manual VACUUM
should fix the problem, as suggested by the
hint; but note that the VACUUM
must be performed by a
superuser, else it will fail to process system catalogs and thus not
be able to advance the database's datfrozenxid
.)
If these warnings are ignored, the system will eventually refuse
to start any new transactions. This happens at the point that
there are fewer than three million transactions left:
ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb" HINT: Stop the postmaster and vacuum that database in single-user mode.
The three-million-transaction safety margin exists to let the
administrator recover without data loss, by manually executing the
required VACUUM
commands. However, since the system will not
execute commands once it has gone into the safety shutdown mode,
the only way to do this is to stop the server and start the server in single-user
mode to execute VACUUM
. The shutdown mode is not enforced
in single-user mode. See the postgres reference
page for details about using single-user mode.
Anything that influences when and how
relfrozenxid
and
relminmxid
advance will also directly
affect the high watermark storage overhead from storing a great
deal of historical transaction status information. The
additional space
overhead is usually of fairly minimal concern. It is
noted as an additional downside of allowing the system to get
close to xidStopLimit
for the sake of
completeness.
The term “wraparound” is inaccurate. Also, there is no “data loss” here — the message is simply wrong.
XXX: We really need to fix the situation with single user mode to put things on a good footing.
In emergencies, VACUUM
will take extraordinary
measures to avoid xidStopLimit
mode. A
failsafe mechanism is triggered when thresholds controlled by
vacuum_failsafe_age and vacuum_multixact_failsafe_age are reached. The
failsafe prioritizes advancing
relfrozenxid
and/or
relminmxid
as quickly as possible.
Once the failsafe triggers, VACUUM
bypasses
all remaining non-essential maintenance tasks, and stops applying
any cost-based delay that was in effect. Any Buffer Access
Strategy in use will also be disabled.
Vacuum maintains a visibility
map for each table to keep track of which pages contain
only tuples that are known to be visible to all active
transactions (and all future transactions, until the page is again
modified). This has two purposes. First, vacuum itself can skip
such pages on the next run, since there is nothing to clean up.
Even VACUUM
s that use the aggressive strategy can skip
pages that are both all-visible and all-frozen (the visibility map
keeps track of which pages are all-frozen separately).
Second, it allows PostgreSQL to answer some queries using only the index, without reference to the underlying table. Since PostgreSQL indexes don't contain tuple visibility information, a normal index scan fetches the heap tuple for each matching index entry, to check whether it should be seen by the current transaction. An index-only scan, on the other hand, checks the visibility map first. If it's known that all tuples on the page are visible, the heap fetch can be skipped. This is most useful on large data sets where the visibility map can prevent disk accesses. The visibility map is vastly smaller than the heap, so it can easily be cached even when the heap is very large.
As noted in Section 25.2.2.2, anything that
influences when and how relfrozenxid
and
relminmxid
advance will also directly
affect the high watermark storage overhead needed to store
historical transaction status information. For example,
increasing autovacuum_freeze_max_age
(and
vacuum_freeze_table_age
along with it) will
make the pg_xact
and
pg_commit_ts
subdirectories of the database
cluster take more space, because they store the commit status and
(if track_commit_timestamp
is enabled)
timestamp of all transactions back to the
datfrozenxid
horizon (the earliest
datfrozenxid
in the entire cluster).
The commit status uses two bits per transaction. The default
autovacuum_freeze_max_age
setting of 200
million transactions translates to about 50MB of
pg_xact
storage. When
track_commit_timestamp
is enabled, about 2GB of
pg_commit_ts
storage will also be required.
MultiXactId status information is implemented as two separate
SLRU storage areas:
pg_multixact/members
, and
pg_multixact/offsets
. There is no simple
formula to determine the storage overhead per MultiXactId, since
MultiXactIds have a variable number of member XIDs.
Truncating of transaction status information is only possible at
the end of VACUUM
s that advance
relfrozenxid
(in the case of
pg_xact
and
pg_commit_ts
) or
relminmxid
(in the case of
(pg_multixact/members
and
pg_multixact/offsets
) of whatever table
happened to have the oldest value in the cluster when the
VACUUM
began. This typically happens very
infrequently, often during aggressive strategy
VACUUM
s of one of the database's largest
tables.
The PostgreSQL query planner relies on
statistical information about the contents of tables in order to
generate good plans for queries. These statistics are gathered by
the ANALYZE
command,
which can be invoked by itself or
as an optional step in VACUUM
. It is important to have
reasonably accurate statistics, otherwise poor choices of plans might
degrade database performance.
The autovacuum daemon, if enabled, will automatically issue
ANALYZE
commands whenever the content of a table has
changed sufficiently. However, administrators might prefer to rely
on manually-scheduled ANALYZE
operations, particularly
if it is known that update activity on a table will not affect the
statistics of “interesting” columns. The daemon schedules
ANALYZE
strictly as a function of the number of rows
inserted or updated; it has no knowledge of whether that will lead
to meaningful statistical changes.
Tuples changed in partitions and inheritance children do not trigger
analyze on the parent table. If the parent table is empty or rarely
changed, it may never be processed by autovacuum, and the statistics for
the inheritance tree as a whole won't be collected. It is necessary to
run ANALYZE
on the parent table manually in order to
keep the statistics up to date.
As with vacuuming for space recovery, frequent updates of statistics
are more useful for heavily-updated tables than for seldom-updated
ones. But even for a heavily-updated table, there might be no need for
statistics updates if the statistical distribution of the data is
not changing much. A simple rule of thumb is to think about how much
the minimum and maximum values of the columns in the table change.
For example, a timestamp
column that contains the time
of row update will have a constantly-increasing maximum value as
rows are added and updated; such a column will probably need more
frequent statistics updates than, say, a column containing URLs for
pages accessed on a website. The URL column might receive changes just
as often, but the statistical distribution of its values probably
changes relatively slowly.
It is possible to run ANALYZE
on specific tables and even
just specific columns of a table, so the flexibility exists to update some
statistics more frequently than others if your application requires it.
In practice, however, it is usually best to just analyze the entire
database, because it is a fast operation. ANALYZE
uses a
statistically random sampling of the rows of a table rather than reading
every single row.
Although per-column tweaking of ANALYZE
frequency might not be
very productive, you might find it worthwhile to do per-column
adjustment of the level of detail of the statistics collected by
ANALYZE
. Columns that are heavily used in WHERE
clauses and have highly irregular data distributions might require a
finer-grain data histogram than other columns. See ALTER TABLE
SET STATISTICS
, or change the database-wide default using the default_statistics_target configuration parameter.
Also, by default there is limited information available about the selectivity of functions. However, if you create a statistics object or an expression index that uses a function call, useful statistics will be gathered about the function, which can greatly improve query plans that use the expression index.
The autovacuum daemon does not issue ANALYZE
commands for
foreign tables, since it has no means of determining how often that
might be useful. If your queries require statistics on foreign tables
for proper planning, it's a good idea to run manually-managed
ANALYZE
commands on those tables on a suitable schedule.
The autovacuum daemon does not issue ANALYZE
commands
for partitioned tables. Inheritance parents will only be analyzed if the
parent itself is changed - changes to child tables do not trigger
autoanalyze on the parent table. If your queries require statistics on
parent tables for proper planning, it is necessary to periodically run
a manual ANALYZE
on those tables to keep the statistics
up to date.