PostgreSQL databases require periodic
maintenance known as vacuuming, and require
periodic updates to the statistics used by the
PostgreSQL query planner. The VACUUM
and ANALYZE
commands
perform these maintenance tasks. The autovacuum
daemon automatically schedules execution of
maintenance, based on the requirements of the workload.
The autovacuum daemon has to process each table regularly for several reasons:
VACUUM
operation avoid needlessly scanning already-frozen pages.
Maintenance work within the scope of items 1, 2, 3, and 4 is
performed by the VACUUM
command internally. The
ANALYZE
command handles maintenance work within
the scope of item 5 (maintenance of planner statistics) internally.
Generally speaking, database administrators that are new to tuning
autovacuum should start by considering adjusting autovacuum's
scheduling. Autovacuum scheduling is controlled via threshold
settings. These settings determine when autovacuum should launch a
worker to run VACUUM
and/or
ANALYZE
; see the previous section, Section 25.1. This section provides additional
information about the design and goals of autovacuum,
VACUUM
, and ANALYZE
. The
intended audience is database administrators that wish to perform
more advanced tuning of autovacuum, with any of the following goals
in mind:
Tuning VACUUM
to improve query response times.
Making sure that VACUUM
's management of the
Transaction ID address space is operating normally.
Tuning VACUUM
for performance stability.
With larger installations, tuning autovacuum usually won't be a once-off task; it is best to approach tuning as an iterative, applied process. FIXME Expand this to describe the intended audience on goals in a fully worked out way.
Autovacuum creates a substantial amount of I/O traffic, which can cause poor performance for other active sessions. There are configuration parameters that you can adjust to reduce the performance impact of background vacuuming. See the autovacuum-specific cost delay settings described in Section 20.10, and additional cost delay settings described in Section 20.4.4.
Some database administrators will want to supplement the daemon's
activities with manually-managed VACUUM
commands. Scripting tools like cron and
Task Manager can be of help with this.
It can be useful to perform off-hours VACUUM
commands during periods where reduced load is expected. Almost all
of the contents of this section apply equally to manually-issued
VACUUM
and ANALYZE
operations.
In PostgreSQL, an
UPDATE
or DELETE
of a row does not
immediately remove the old version of the row.
This approach is necessary to gain the benefits of multiversion
concurrency control (MVCC, see Chapter 13): the row version
must not be deleted while it is still potentially visible to other
transactions. A deleted row version (whether from an
UPDATE
or DELETE
) will
usually cease to be of interest to any still-running transaction
shortly after the original deleting transaction commits.
The space dead tuples occupy must eventually be reclaimed for
reuse by new rows, to avoid unbounded growth of disk space
requirements. Reclaiming space from dead rows is
VACUUM
's main responsibility.
The transaction ID number
(XID) based cutoff point that
VACUUM
uses to determine if a deleted tuple is
safe to physically remove is reported under removable
cutoff
in the server log when autovacuum logging
(controlled by log_autovacuum_min_duration)
reports on a VACUUM
operation executed by
autovacuum. Tuples that are not yet safe to remove are counted as
dead but not yet removable
tuples in the log
report. VACUUM
establishes its
removable cutoff
once, at the start of the
operation. Any older MVCC snapshot (or
transaction that allocates an XID) that's still running when the
cutoff is established may hold it back.
It's important that no long-running transactions ever be allowed
to hold back every VACUUM
operation's cutoff
for an extended period. You may wish to add monitoring to alert
on this.
VACUUM
can remove tuples inserted by aborted
transactions immediately
VACUUM
usually won't return space to the
operating system. There is one exception: space is returned to the
OS whenever a group of contiguous pages appears at the end of a
table. VACUUM
must acquire an ACCESS
EXCLUSIVE
lock to perform relation truncation. You can
disable relation truncation by setting the table's
vacuum_truncate
storage parameter to
off
.
If you have a table whose entire contents are deleted on a
periodic basis, consider doing it with TRUNCATE
rather
than relying on VACUUM
.
TRUNCATE
removes the entire contents of the
table immediately, avoiding the need to set
xmax
to the deleting transaction's XID.
One disadvantage is that strict MVCC semantics are violated.
VACUUM FULL
or CLUSTER
can
be useful when dealing with extreme amounts of dead tuples. It
can reclaim more disk space, but runs much more slowly. It
rewrites an entire new copy of the table and rebuilds all of the
table's indexes. As a result, VACUUM FULL
and
CLUSTER
typically have a much higher overhead
than VACUUM
. Generally, therefore,
administrators should avoid using VACUUM FULL
except in the most extreme cases.
Although VACUUM FULL
is technically an option
of the VACUUM
command, VACUUM
FULL
uses a completely different implementation.
VACUUM FULL
is essentially a variant of
CLUSTER
. (The name VACUUM
FULL
is historical; the original implementation was
somewhat closer to standard VACUUM
.)
TRUNCATE
, VACUUM FULL
, and
CLUSTER
all require an ACCESS
EXCLUSIVE
lock, which can be highly disruptive
(SELECT
, INSERT
,
UPDATE
, and DELETE
commands
won't be able to run at the same time).
VACUUM FULL
and CLUSTER
temporarily use extra disk space. The extra space required is
approximately equal to the size of the table, since the old
copies of the table and indexes can't be released until the new
ones are complete.
VACUUM
often marks some of the pages that it
scans frozen, indicating that all eligible
rows on the page were inserted by a transaction that committed
sufficiently far in the past that the effects of the inserting
transaction are certain to be visible to all current and future
transactions. The specific Transaction ID number
(XID) stored in a frozen heap row's
xmin
field is no longer needed to
determine anything about the row's visibility. Furthermore, when
a row undergoing freezing happens to have an XID set in its
xmax
field (possibly an XID left behind
by an earlier SELECT FOR UPDATE
row locker),
the xmax
field's XID is usually also
removed.
Once frozen, heap pages are “self-contained”. Every
query can read all of the page's rows in a way that assumes that
the inserting transaction committed and is visible to its
MVCC snapshot. No query will ever have to
consult external transaction status metadata to interpret the
page's contents, either. In particular,
pg_xact
transaction XID commit/abort status
lookups won't take place during query execution.
Freezing is a WAL-logged operation, so when
VACUUM
freezes a heap page, any copy of the
page located on a physical replication standby server will itself
be “frozen” shortly thereafter (when the relevant
FREEZE_PAGE
WAL record is
replayed on the standby). Queries that run on physical
replication standbys thereby avoid pg_xact
lookups when reading from frozen pages, in just the same way as
queries that run on the primary server
[15].
It can be useful for VACUUM
to put off some of
the work of freezing, but VACUUM
cannot put off
freezing forever. Since on-disk storage of transaction IDs in
heap row headers uses a truncated 32-bit representation to save
space (rather than the full 64-bit representation), freezing plays
a crucial role in enabling management of the XID address
space by VACUUM
. If, for whatever
reason, VACUUM
is unable to freeze older XIDs
on behalf of an application that continues to require new XID
allocations, the system will eventually
refuse to allocate new transaction IDs.
The system generally only enters this state when autovacuum is
misconfigured.
vacuum_freeze_min_age controls when freezing
takes place. When VACUUM
scans a heap page
containing even one XID that has already attained an age exceeding
this value, the page is frozen.
MultiXact IDs are used to support row
locking by multiple transactions. Since there is only limited
space in a tuple header to store lock information, that
information is encoded as a “multiple transaction
ID”, or MultiXact ID for short, whenever there is more
than one transaction concurrently locking a row. Information
about which transaction IDs are included in any particular
MultiXact ID is stored separately in
pg_multixact
, and only the MultiXact ID
itself (a 32-bit unsigned integer) appears in the tuple's
xmax
field. This creates a dependency
on external transaction status information similar to the
dependency that ordinary unfrozen XIDs have on commit status
information stored in pg_xact
.
VACUUM
must therefore occasionally remove
MultiXact IDs from tuples during freezing.
vacuum_multixact_freeze_min_age also
controls when freezing takes place. It is analogous to
vacuum_freeze_min_age
, but “age”
is expressed in units of MultiXact ID.
Lowering vacuum_multixact_freeze_min_age
forces VACUUM
to process
xmax
fields containing a MultiXact ID
in cases where it would otherwise opt to put off the work of
processing xmax
until the next
VACUUM
[16]. The setting generally doesn't significantly
influence the total number of pages VACUUM
freezes, even in tables that contain relatively many MultiXact
IDs. This is because VACUUM
generally prefers
proactively processing for most individual
xmax
fields that contain a MultiXact ID
(eager proactive processing is typically cheaper).
Managing the added WAL volume from freezing
over time is an important consideration for
VACUUM
. It is why VACUUM
doesn't just freeze every eligible tuple at the earliest
opportunity: the WAL written to freeze a page's
tuples “goes to waste” in cases where the resulting
frozen tuples are soon deleted or updated anyway. It's also why
VACUUM
will freeze all
eligible tuples from a heap page once the decision to freeze at
least one tuple is taken: at that point the added cost to freeze
all eligible tuples eagerly (measured in “extra bytes of
WAL written”) is far lower than the
probable cost of deferring freezing until a future
VACUUM
operation against the same table.
Furthermore, once the page is frozen it can generally be marked as
all-frozen in the visibility map right away.
In PostgreSQL versions before 16,
VACUUM
triggered freezing at the level of
individual xmin
and
xmax
fields. Freezing only affected
the exact XIDs that had already attained an age of
vacuum_freeze_min_age
or greater.
VACUUM
also triggers freezing of a page in
cases where it already proved necessary to write out a full page
image (FPI) as part of a WAL
record describing how dead tuples were removed [17] (see Section 30.1 for background
information about how FPIs provide torn page
protection). This “freeze on an FPI
write” batching mechanism often avoids the need for some
future VACUUM
operation to write an additional
FPI for the same page as part of a
WAL record describing how live tuples were
frozen. In effect, VACUUM
writes slightly more
WAL in the short term with the aim of
ultimately needing to write much less WAL in
the long term.
For tables which receive INSERT
operations,
but few or no UPDATE
/DELETE
operations, it may be beneficial to selectively lower autovacuum_freeze_min_age for the table.
VACUUM
may thereby be able to freeze the
table's pages “eagerly” during earlier autovacuums
triggered by autovacuum_vacuum_insert_scale_factor.
VACUUM
may not be able to freeze every tuple's
xmin
in relatively rare cases. The
criteria that determines basic eligibility for freezing is the
same as the one that determines if a deleted tuple can be
removed: the XID-based removable cutoff
that
appears in the server log's autovacuum log reports (controlled by
log_autovacuum_min_duration).
In extreme cases, a long-running transaction can hold back every
VACUUM
's removable cutoff for so long that the
system is forced to activate xidStopLimit
mode
protections.
VACUUM
#
As noted already, freezing doesn't just allow queries to avoid
lookups of subsidiary transaction status information in
structures such as pg_xact
. Freezing also
plays a crucial role in enabling management of the XID address
space by VACUUM
. VACUUM
maintains information about the oldest unfrozen XID that remains
in the table when it uses its aggressive strategy.
Aggressive VACUUM
will update the table's
pg_class
.relfrozenxid
to the value that it determined to be the oldest remaining XID;
the table's relfrozenxid
“advances” by a certain number of XIDs. Aggressive
VACUUM
may also need to update the
datfrozenxid
column of the database's
pg_database
row in turn.
datfrozenxid
is a lower bound on the
unfrozen XIDs appearing in that database — it is just the
minimum of the per-table relfrozenxid
values (the relfrozenxid
that has
attained the greatest age) within the database.
Aggressive VACUUM
also maintains the
pg_class
.relminmxid
and pg_database
.datminmxid
fields. These are needed to track the oldest MultiXact ID that
remains in the table and database, respectively.
The extra steps performed within every aggressive
VACUUM
against every table have the overall
effect of tracking the oldest remaining unfrozen transaction ID
in the entire cluster (every table from every database).
Aggressive VACUUM
s will (in the aggregate and
over time) make sure that the oldest unfrozen transaction ID in
the entire system is never too far in the past.
Freezing removes local dependencies on
external transaction status information from individual heap
pages. Advancing relfrozenxid
removes global dependencies from whole
tables in turn.
The oldest XID in the entire cluster can be thought of as the
beginning of the XID space, while the next unallocated XID can
be thought of as the end of the XID space. This space
represents the range of XIDs that might still require
transaction commit/abort status lookups in pg_xact
.
The maximum XID age that the system can tolerate (i.e., the
maximum “distance” between the oldest unfrozen
transaction ID in any table according to
pg_class
.relfrozenxid
,
and the next unallocated transaction ID) is about 2.1 billion
transaction IDs. This “maximum XID age” invariant
makes it fundamentally impossible to put off aggressive
VACUUM
s (and freezing) forever
[18]. The invariant imposes an absolute hard limit on how
long any table can go without an aggressive VACUUM
.
If the hard limit is ever reached, then the system will activate
xidStopLimit
mode, which temporarily prevents the allocation of new
permanent transaction IDs. The system will only deactive
xidStopLimit
mode when
VACUUM
(typically run by autovacuum) succeeds
in advancing the oldest datfrozenxid
in the
cluster (via an aggressive VACUUM
that runs to
completion against the table that has the oldest
relfrozenxid
).
The 2.1 billion XIDs “maximum XID age” invariant
must be preserved because transaction IDs stored in heap row
headers use a truncated 32-bit representation (rather than the
full 64-bit representation). Since all unfrozen transaction IDs
from heap tuple headers must be from the
same transaction ID epoch (or from a space in the 64-bit
representation that spans two adjoining transaction ID epochs),
there isn't any need to store a separate epoch field in each
tuple header (see Section 74.1.2.1 for
further details). This scheme has the advantage of requiring
much less on-disk storage space than a design that stores an XID
epoch alongside each XID stored in each heap tuple header. It
has the disadvantage of constraining the system's ability to
allocate new XIDs in the worst case scenario where
xidStopLimit
mode is used to preserve the
“maximum XID age” invariant.
There is only one major runtime behavioral difference between
aggressive mode VACUUM
s and non-aggressive
VACUUM
s: only non-aggressive
VACUUM
s will skip pages that don't have any
dead row versions even if those pages still have row versions
with old XID values (pages marked as all-visible in the
visibility map). Aggressive VACUUM
s can only
skip pages that are marked as both all-visible and all-frozen.
Consequently, non-aggressive VACUUM
s usually
won't freeze every page containing an XID
that has already attained an age of
vacuum_freeze_min_age
or more. Failing to
freeze older pages during non-aggressive
VACUUM
s may lead to aggressive
VACUUM
s that perform a disproportionately
large amount of the work of freezing required by one particular
table.
When the VACUUM
command's
VERBOSE
parameter is specified,
VACUUM
prints various statistics about the
table. Its output includes information about how
relfrozenxid
and
relminmxid
advanced, and the number
of newly frozen pages. The same details appear in the server
log when autovacuum logging (controlled by log_autovacuum_min_duration) reports on a
VACUUM
operation executed by autovacuum.
In practice, most tables require periodic aggressive vacuuming.
However, some individual non-aggressive
VACUUM
operations may be able to advance
relfrozenxid
and/or
relminmxid
. Non-aggressive
relfrozenxid
/relminmxid
advancement is most common in small, frequently modified tables.
Most individual tables will eventually need an aggressive
VACUUM
, which will reliably freeze all pages
with XID (or MultiXact ID) values older than
vacuum_freeze_min_age
(or older than
vacuum_multixact_freeze_min_age
), including
those from all-visible but not all-frozen pages (and then advance
pg_class
.relfrozenxid
to a value that reflects all that). vacuum_freeze_table_age controls when
VACUUM
must use its aggressive strategy. If
age(relfrozenxid)
exceeds
vacuum_freeze_table_age
at the start of
VACUUM
, that VACUUM
will
use the aggressive strategy; otherwise the standard
non-aggressive strategy is used. Setting
vacuum_freeze_table_age
to 0 forces
VACUUM
to always use its aggressive strategy.
To ensure that every table has its
relfrozenxid
advanced at somewhat
regular intervals, even in the case of completely static tables,
autovacuum runs against any table that might contain unfrozen
rows with XIDs older than the age specified by the configuration
parameter autovacuum_freeze_max_age. These
are anti-wraparound autovacuums.
Anti-wraparound autovacuums can happen even when autovacuum is
nominally disabled in postgresql.conf
.
In practice, all anti-wraparound autovacuums will use
VACUUM
's aggressive strategy (if they didn't,
then it would defeat the whole purpose of anti-wraparound
autovacuuming). Use of VACUUM
's aggressive
strategy is certain, because the effective value of
vacuum_freeze_table_age
is silently
“clamped” to a value no greater than 95% of the
current value of autovacuum_freeze_max_age
.
As a rule of thumb, vacuum_freeze_table_age
should be set to a value somewhat below
autovacuum_freeze_max_age
, so that there is a
window during which any autovacuum triggered by inserts, updates,
or deletes (or any manually issued VACUUM
)
will become an aggressive VACUUM
. Such
VACUUM
s will reliably advance
relfrozenxid
in passing, even though
autovacuum won't have specifically set out to make sure
relfrozenxid
advances through
anti-wraparound autovacuuming. Anti-wraparound autovacuums may
never be required at all in tables that regularly require
vacuuming to reclaim
space from dead tuples and/or to set pages all-visible in the
visibility map (especially if
vacuum_freeze_table_age
is set to a value
significantly below
autovacuum_freeze_max_age
).
Aggressive VACUUM
is a special form of
VACUUM
. An aggressive
VACUUM
must advance
relfrozenxid
up to an XID value that
is no greater than vacuum_freeze_min_age
XIDs
in age as of the start of the
VACUUM
operation.
Anti-wraparound autovacuum is a special form of Autovacuum. Its
purpose is to make sure that
relfrozenxid
is advanced when no
earlier aggressive VACUUM
ran and advanced
relfrozenxid
in passing (often
because no VACUUM
needed to run against the
table at all).
There is only one runtime behavioral difference between
anti-wraparound autovacuums and other autovacuums that happen to
end up running an aggressive VACUUM
:
Anti-wraparound autovacuums cannot be
autocancelled. This means that autovacuum workers
that perform anti-wraparound autovacuuming do not yield to
conflicting relation-level lock requests (e.g., from
ALTER TABLE
). See Section 25.1.3 for a full explanation.
VACUUM
also applies vacuum_multixact_freeze_table_age and autovacuum_multixact_freeze_max_age. These are
independent MultiXact ID based triggers of aggressive
VACUUM
(and anti-wraparound autovacuum). They
are applied by following rules analogous to the rules already
described for vacuum_freeze_table_age
and
autovacuum_freeze_max_age
, respectively
[19].
It doesn't matter if it was vacuum_freeze_table_age
or
vacuum_multixact_freeze_table_age
that
triggered VACUUM
's decision to use its
aggressive strategy. Every aggressive
VACUUM
will advance
relfrozenxid
and
relminmxid
by following the same
generic steps at runtime.
A convenient way to examine information about
relfrozenxid
and
relminmxid
is to execute queries such as:
SELECT c.oid::regclass as table_name, greatest(age(c.relfrozenxid), age(t.relfrozenxid)) as xid_age, mxid_age(c.relminmxid) FROM pg_class c LEFT JOIN pg_class t ON c.reltoastrelid = t.oid WHERE c.relkind IN ('r', 'm'); SELECT datname, age(datfrozenxid) as xid_age, mxid_age(datminmxid) FROM pg_database;
The age
function returns the number of
transactions from relfrozenxid
to the
next unallocated transaction ID. The
mxid_age
function the number of MultiXact
IDs from relminmxid
to the next
unallocated MultiXact ID.
The system should always have significant XID allocation slack
capacity. Ideally, the greatest
age(relfrozenxid)
/age(datfrozenxid)
in the system will never be more than a fraction of the 2.1
billion XID hard limit described in Section 25.2.2.1. The default
vacuum_freeze_table_age
setting of 200 million
transactions implies that the system should never use
significantly more than about 10% of that hard limit.
There is little advantage in routinely allowing the greatest
age(relfrozenxid)
in the system to get
anywhere near to the 2.1 billion XID hard limit. Putting off the
work of freezing can only reduce the absolute amount of
WAL written by VACUUM
when
VACUUM
thereby completely avoids freezing rows
that are deleted before long anyway. There is little or no
disadvantage from lowering vacuum_freeze_table_age
to make aggressive VACUUM
s more frequent, at
least in tables where newly frozen pages almost always remain
all-frozen forever. Note also that anything that leads to
relfrozenxid
and
relminmxid
advancing less frequently
(such as a higher vacuum_freeze_table_age
setting) will also increase the on-disk space required to store
additional transaction status information, as described in Section 25.2.4.
xidStopLimit
mode #
If for some reason autovacuum utterly fails to advance any
table's relfrozenxid
or
relminmxid
for an extended period, and
if XIDs and/or MultiXact IDs continue to be allocated, the system
will begin to emit warning messages like this when the database's
oldest XIDs reach forty million transactions from the 2.1 billion
XID hard limit described in Section 25.2.2.1:
WARNING: database "mydb" must be vacuumed within 39985967 transactions HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
(A manual VACUUM
should fix the problem, as suggested by the
hint; but note that the VACUUM
must be performed by a
superuser, else it will fail to process system catalogs and thus not
be able to advance the database's datfrozenxid
.)
If these warnings are ignored, the system will eventually refuse
to start any new transactions. This happens at the point that
there are fewer than three million transactions left:
ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb" HINT: Stop the postmaster and vacuum that database in single-user mode.
The three-million-transaction safety margin exists to let the
administrator recover without data loss, by manually executing the
required VACUUM
commands. However, since the system will not
execute commands once it has gone into the safety shutdown mode,
the only way to do this is to stop the server and start the server in single-user
mode to execute VACUUM
. The shutdown mode is not enforced
in single-user mode. See the postgres reference
page for details about using single-user mode.
In emergencies, VACUUM
will take extraordinary
measures to avoid xidStopLimit
mode. A
failsafe mechanism is triggered when the table's
relfrozenxid
attains an age of vacuum_failsafe_age XIDs, or when the table's
relminmxid
attains an age of vacuum_multixact_failsafe_age MultiXact IDs.
The failsafe prioritizes advancing
relfrozenxid
and/or
relminmxid
as quickly as possible.
Once the failsafe triggers, VACUUM
bypasses
all remaining non-essential maintenance tasks, and stops applying
any cost-based delay that was in effect. Any Buffer Access
Strategy in use will also be disabled.
VACUUM
maintains a visibility map for each table to keep
track of which pages contain only tuples that are known to be
visible to all active transactions (and all future transactions,
at least until the page is modified). A separate bit tracks
whether all of the tuples are frozen.
The visibility map serves two purposes.
First, VACUUM
itself can skip such pages on the
next run, since there is nothing to clean up. Even aggressive VACUUM
s
can skip pages that are both all-visible and all-frozen.
Second, it allows PostgreSQL to answer some queries using only the index, without reference to the underlying table. Since PostgreSQL indexes don't contain tuple visibility information, a normal index scan fetches the heap tuple for each matching index entry, to check whether it should be seen by the current transaction. An index-only scan, on the other hand, checks the visibility map first. If it's known that all tuples on the page are visible, the heap fetch can be skipped. This is most useful on large data sets where the visibility map can prevent disk accesses. The visibility map is vastly smaller than the heap, so it can easily be cached even when the heap is very large.
Anything that influences when and how
relfrozenxid
and
relminmxid
advance will also directly
affect the high watermark storage overhead needed to store
historical transaction status information. For example,
increasing autovacuum_freeze_max_age
(and
vacuum_freeze_table_age
along with it) will
make the pg_xact
and
pg_commit_ts
subdirectories of the database
cluster take more space, because they store the commit/abort
status and (if track_commit_timestamp
is enabled)
timestamp of all transactions back to the
datfrozenxid
horizon (the earliest
datfrozenxid
among all databases in the
cluster).
The commit status uses two bits per transaction. The default
autovacuum_freeze_max_age
setting of 200
million transactions translates to about 50MB of
pg_xact
storage. When
track_commit_timestamp
is enabled, about 2GB of
pg_commit_ts
storage will also be required.
MultiXact ID status information storage uses two separate
underlying SLRU storage areas:
pg_multixact/members
, and
pg_multixact/offsets
. There is no simple
formula to determine the storage overhead per MultiXact ID, since
in general MultiXact IDs have a variable number of member XIDs.
Note, however, that if pg_multixact/members
exceeds 2GB, then the effective value of
autovacuum_multixact_freeze_max_age
used by
VACUUM
will be lower, resulting in more
frequent aggressive mode VACUUM
s.
Truncation of transaction status information is only possible at
the end of VACUUM
s that advance the earliest
relfrozenxid
(in the case of
pg_xact
and
pg_commit_ts
), or the earliest
relminmxid
(in the case of
pg_multixact/members
and
pg_multixact/offsets
) among all tables in the
entire database (assuming that its the database with the earliest
datfrozenxid
and
datminmxid
in the entire cluster).
The PostgreSQL query planner relies on
statistical information about the contents of tables in order to
generate good plans for queries. These statistics are gathered by
the ANALYZE
command,
which can be invoked by itself or
as an optional step in VACUUM
. It is important to have
reasonably accurate statistics, otherwise poor choices of plans might
degrade database performance.
The autovacuum daemon, if enabled, will automatically issue
ANALYZE
commands whenever the content of a table has
changed sufficiently. However, administrators might prefer to rely
on manually-scheduled ANALYZE
operations, particularly
if it is known that update activity on a table will not affect the
statistics of “interesting” columns. The daemon schedules
ANALYZE
strictly as a function of the number of rows
inserted or updated; it has no knowledge of whether that will lead
to meaningful statistical changes.
Tuples changed in partitions and inheritance children do not trigger
analyze on the parent table. If the parent table is empty or rarely
changed, it may never be processed by autovacuum, and the statistics for
the inheritance tree as a whole won't be collected. It is necessary to
run ANALYZE
on the parent table manually in order to
keep the statistics up to date.
As with vacuuming for space recovery, frequent updates of statistics
are more useful for heavily-updated tables than for seldom-updated
ones. But even for a heavily-updated table, there might be no need for
statistics updates if the statistical distribution of the data is
not changing much. A simple rule of thumb is to think about how much
the minimum and maximum values of the columns in the table change.
For example, a timestamp
column that contains the time
of row update will have a constantly-increasing maximum value as
rows are added and updated; such a column will probably need more
frequent statistics updates than, say, a column containing URLs for
pages accessed on a website. The URL column might receive changes just
as often, but the statistical distribution of its values probably
changes relatively slowly.
It is possible to run ANALYZE
on specific tables and even
just specific columns of a table, so the flexibility exists to update some
statistics more frequently than others if your application requires it.
In practice, however, it is usually best to just analyze the entire
database, because it is a fast operation. ANALYZE
uses a
statistically random sampling of the rows of a table rather than reading
every single row.
Although per-column tweaking of ANALYZE
frequency might not be
very productive, you might find it worthwhile to do per-column
adjustment of the level of detail of the statistics collected by
ANALYZE
. Columns that are heavily used in WHERE
clauses and have highly irregular data distributions might require a
finer-grain data histogram than other columns. See ALTER TABLE
SET STATISTICS
, or change the database-wide default using the default_statistics_target configuration parameter.
Also, by default there is limited information available about the selectivity of functions. However, if you create a statistics object or an expression index that uses a function call, useful statistics will be gathered about the function, which can greatly improve query plans that use the expression index.
The autovacuum daemon does not issue ANALYZE
commands for
foreign tables, since it has no means of determining how often that
might be useful. If your queries require statistics on foreign tables
for proper planning, it's a good idea to run manually-managed
ANALYZE
commands on those tables on a suitable schedule.
The autovacuum daemon does not issue ANALYZE
commands
for partitioned tables. Inheritance parents will only be analyzed if the
parent itself is changed - changes to child tables do not trigger
autoanalyze on the parent table. If your queries require statistics on
parent tables for proper planning, it is necessary to periodically run
a manual ANALYZE
on those tables to keep the statistics
up to date.
[15]
In this regard freezing is unlike setting transaction status
“hint bits” in tuple headers: setting hint bits
doesn't usually need to be WAL-logged, and
can take place on physical replication standby servers without
the involvement of the primary server. The purpose of hint bits
is to avoid repeat pg_xact
lookups for the
same tuples, strictly as an optimization. The purpose of
freezing (from the point of view of individual tuples) is to
reliably remove each tuple's dependency on
pg_xact
, ultimately making it safe to
truncate pg_xact
from time to time.
[16]
“Freezing” of xmax
fields (whether they were found to contain an XID or a MultiXact
ID) generally means clearing xmax
.
VACUUM
may occasionally encounter an
individual MultiXact ID that must be removed to advance
relminmxid
by the required amount,
which can only be processed by generating a replacement
MultiXact ID (containing just the non-removable subset of member
XIDs from the original MultiXact ID), and then setting the
tuple's xmax
to the new/replacement
MultiXact ID value.
[17]
Actually, the “freeze on an FPI
write” mechanism isn't just triggered whenever
VACUUM
needed to write an
FPI for torn page protection as part of
writing a PRUNE
WAL record
describing how dead tuples were removed. The
FPI mechanism can also be triggered when hint
bits are set by VACUUM
, if and only if doing
so necessitates writing an FPI.
WAL-logging in order to set hint bits is only
possible when the wal_log_hints option is
enabled in postgresql.conf
, or when data
checksums were enabled when the cluster was initialized with
initdb.
[18]
Aggressive VACUUM
s cannot be put off
forever, barring the edge-case where the
installation is never expected to consume more than about 2.1
billion XIDs. In practice this has practical
relevance.
[19]
Though note that autovacuum (and VACUUM
) use a lower
“effective”
autovacuum_multixact_freeze_max_age
value (determined dynamically) to deal with issues with
truncation of the SLRU storage areas, as
explained in Section 25.2.4