25.2. Autovacuum Maintenance Tasks #

25.2.1. Recovering Disk Space
25.2.2. Freezing to manage the transaction ID space
25.2.3. Updating the Visibility Map
25.2.4. Truncating transaction status information
25.2.5. Updating Planner Statistics

PostgreSQL databases require periodic maintenance known as vacuuming, and require periodic updates to the statistics used by the PostgreSQL query planner. These maintenance tasks are performed by the VACUUM and ANALYZE commands respectively. For most installations, it is sufficient to let the autovacuum daemon determine when to perform these maintenance tasks (which is partly determined by configurable table-level thresholds; see Section 25.1).

The autovacuum daemon has to process each table on a regular basis for several reasons:

  1. To recover or reuse disk space occupied by updated or deleted rows.
  2. To maintain the system's ability to allocated new transaction IDs (and new multixact IDs) through freezing.
  3. To update the visibility map, which speeds up index-only scans, and helps the next VACUUM operation avoid needlessly scanning pages that are already frozen
  4. To truncate obsolescent transaction status information, when possible
  5. To update data statistics used by the PostgreSQL query planner.

Maintenance work within the scope of items 1, 2, 3, and 4 is performed by the VACUUM command internally. Item 5 (maintenance of planner statistics) is handled by the ANALYZE command internally. Although this section presents information about autovacuum, there is no difference between manually-issued VACUUM and ANALYZE commands and those run by the autovacuum daemon (though there are autovacuum-specific variants of a small number of settings that control VACUUM).

Autovacuum creates a substantial amount of I/O traffic, which can cause poor performance for other active sessions. There are configuration parameters that can be adjusted to reduce the performance impact of background vacuuming. See the autovacuum-specific cost delay settings described in Section 20.10, and additional cost delay settings described in Section 20.4.4.

Some database administrators will want to supplement the daemon's activities with manually-managed VACUUM commands, which typically are executed according to a schedule by cron or Task Scheduler scripts. It can be useful to perform off-hours VACUUM commands during periods where reduced load is expected.

25.2.1. Recovering Disk Space #

In PostgreSQL, an UPDATE or DELETE of a row does not immediately remove the old version of the row. This approach is necessary to gain the benefits of multiversion concurrency control (MVCC, see Chapter 13): the row version must not be deleted while it is still potentially visible to other transactions. A deleted row version (whether from an UPDATE or DELETE) will usually cease to be of interest to any still running transaction shortly after the original deleting transaction commits.

The space dead tuples occupy must eventually be reclaimed for reuse by new rows, to avoid unbounded growth of disk space requirements. Reclaiming space from dead rows is VACUUM's main responsibility.

The XID cutoff point that VACUUM uses to determine whether or not a deleted tuple is safe to physically remove is reported under removable cutoff in the server log when autovacuum logging (controlled by log_autovacuum_min_duration) reports on a VACUUM operation executed by autovacuum. Tuples that are not yet safe to remove are counted as dead but not yet removable tuples in the log report. VACUUM establishes its removable cutoff once, at the start of the operation. Any older snapshot (or transaction that allocates an XID) that's still running when the cutoff is established may hold it back.

Tip

It's important that no long running transactions ever be allowed to hold back every VACUUM operation's cutoff for an extended period. You may wish to monitor this.

Note

Tuples inserted by aborted transactions can be removed by VACUUM immediately

VACUUM will not return space to the operating system, except in the special case where a group of contiguous pages at the end of a table become entirely free and an exclusive table lock can be easily obtained. This relation truncation behavior can be disabled in tables where the exclusive lock is disruptive by setting the table's vacuum_truncate storage parameter to off.

Tip

If you have a table whose entire contents are deleted on a periodic basis, consider doing it with TRUNCATE rather than relying on VACUUM. TRUNCATE removes the entire contents of the table immediately, avoiding the need to set xmax to the deleting transaction's XID. One disadvantage is that strict MVCC semantics are violated.

Tip

VACUUM FULL or CLUSTER can be useful when dealing with extreme amounts of dead tuples. It can reclaim more disk space, but runs much more slowly. It rewrites an entire new copy of the table and rebuilds all indexes. This typically has much higher overhead than VACUUM. Generally, therefore, administrators should avoid using VACUUM FULL except in the most extreme cases.

Note

Although VACUUM FULL is technically an option of the VACUUM command, VACUUM FULL uses a completely different implementation. VACUUM FULL is essentially a variant of CLUSTER. (The name VACUUM FULL is historical; the original implementation was somewhat closer to standard VACUUM.)

Warning

TRUNCATE, VACUUM FULL, and CLUSTER all require an ACCESS EXCLUSIVE lock, which can be highly disruptive (SELECT, INSERT, UPDATE, and DELETE commands will all be blocked).

Warning

VACUUM FULL (and CLUSTER) temporarily uses extra disk space approximately equal to the size of the table, since the old copies of the table and indexes can't be released until the new ones are complete.

25.2.2. Freezing to manage the transaction ID space #

VACUUM often marks certain pages frozen, indicating that all eligible rows on the page were inserted by a transaction that committed sufficiently far in the past that the effects of the inserting transaction are certain to be visible to all current and future transactions. The specific transaction ID number (XID) stored in a frozen heap row's xmin field is no longer needed to determine anything about the row's visibility. Furthermore, when a row undergoing freezing happens to have an XID set in its xmax field (possibly an XID left behind by an earlier SELECT FOR UPDATE row locker), the xmax field's XID is typically also removed (actually, xmax is set to the special XID value 0, also known as InvalidTransactionId). See Section 73.6.1.1 for further background information.

Once frozen, heap pages are self-contained. All of the page's rows can be read by every transaction, without any transaction ever needing to consult externally stored transaction status metadata (most notably, transaction commit/abort status information from pg_xact won't ever be required).

It can be useful for VACUUM to put off some of the work of freezing, but freezing cannot be put off indefinitely. Since on-disk storage of transaction IDs in heap row headers uses a truncated 32-bit representation to save space (rather than the full 64-bit representation), freezing plays a crucial role in enabling management of the XID address space by VACUUM. If freezing by VACUUM is somehow impeded (in a database that continues to allocate new transaction IDs), the system will eventually refuse to allocate new transaction IDs. This generally only happens in extreme cases where the system has been misconfigured.

vacuum_freeze_min_age can be used to control when freezing takes place. When VACUUM scans a heap page containing even one XID that already attained an age exceeding this value, the page is frozen.

Multixact IDs are used to support row locking by multiple transactions. Since there is only limited space in a tuple header to store lock information, that information is encoded as a multiple transaction ID, or multixact ID for short, whenever there is more than one transaction concurrently locking a row. Information about which transaction IDs are included in any particular multixact ID is stored separately, and only the multixact ID appears in the xmax field in the tuple header. Like transaction IDs, multixact IDs are implemented as a 32-bit counter and corresponding storage. Since MultiXact IDs are stored in the xmax field of heap rows (and have an analogous dependency on external transaction status information), they may also need to be removed during freezing.

vacuum_multixact_freeze_min_age also controls when freezing takes place. It is analogous to vacuum_freeze_min_age, but age is expressed in units of Multixact ID (not in units of XID). vacuum_multixact_freeze_min_age typically has only a minimal impact on how many pages are frozen, partly because VACUUM usually prefers to remove MultiXact IDs proactively based on low-level considerations around the cost of freezing. vacuum_multixact_freeze_min_age forces VACUUM to process MultiXact IDs in certain rare cases where the implementation would not ordinarily do so.

Managing the added WAL volume from freezing over time is an important consideration for VACUUM. This is why VACUUM doesn't just freeze every eligible tuple at the earliest opportunity: the WAL written to freeze a page's tuples goes to waste in cases where the resulting frozen tuples are soon deleted or updated anyway. It's also why VACUUM will freeze all eligible tuples from a heap page once the decision to freeze at least one tuple is taken: at that point the added cost to freeze all eligible tuples eagerly (measured in extra bytes of WAL written) is far lower than the probable cost of deferring freezing until a future VACUUM operation against the same table. Furthermore, once the page is frozen it can be marked all-frozen in the visibility map right away.

Note

In PostgreSQL versions before 16, freezing was triggered at the level of individual xmin and xmax fields. Freezing only affected the exact XIDs that had already attained an age at or exceeding vacuum_freeze_min_age, regardless of costs.

VACUUM also triggers freezing of pages in cases where it already proved necessary to write out an FPI (full page image) alongside a WAL record generated while removing dead tuples (see Section 30.1 for background information about how FPIs provide torn page protection). This freeze on an FPI write mechanism is designed to lower the absolute volume of WAL written over time by VACUUM, across multiple VACUUM operations against the same table. The mechanism often prevents future VACUUM operations from having to write a second FPI for the same page much later on. In effect, VACUUM writes slightly more WAL in the short term with the aim of ultimately needing to write much less WAL in the long term.

VACUUM may not be able to freeze every tuple's xmin in relatively rare cases. The criteria that determines basic eligibility for freezing is exactly the same as the one that determines if a deleted tuple should be considered removable or merely dead but not yet removable (namely, the XID-based removable cutoff). In extreme cases a long-running transaction can hold back every VACUUM's removable cutoff for so long that the system is eventually forced to activate xidStopLimit mode protections.

25.2.2.1. VACUUM's aggressive strategy #

As already noted briefly in the introductory section, freezing doesn't just allow queries to avoid lookups of subsidiary transaction status information in structures such as pg_xact. Freezing also plays a crucial role in enabling management of the XID address space by VACUUM.

VACUUM maintains information about the oldest unfrozen XID that remains in the table when it uses its aggressive strategy. This information is stored in the pg_class system table at the end of each aggressive VACUUM: the table processed by aggressive VACUUM has its pg_class.relfrozenxid updated (relfrozenxid advances by a certain number of XIDs). Similarly, the datfrozenxid column of a database's pg_database row is a lower bound on the unfrozen XIDs appearing in that database — it is just the minimum of the per-table relfrozenxid values within the database. The system also maintains pg_class.relminmxid and pg_database.datminmxid fields to track the oldest MultiXact ID, while following analogous rules.

Tip

When the VACUUM command's VERBOSE parameter is specified, VACUUM prints various statistics about the table. This includes information about how relfrozenxid and relminmxid advanced, and the number of newly frozen pages. The same details appear in the server log when autovacuum logging (controlled by log_autovacuum_min_duration) reports on a VACUUM operation executed by autovacuum.

This process is intended to reliably prevent the entire database from ever having a transaction ID that is excessively far in the past. The maximum distance that the system can tolerate between the oldest unfrozen transaction ID and the next (unallocated) transaction ID is about 2.1 billion transaction IDs. That is an upper limit; the greatest age(relfrozenxid)/age(datfrozenxid) in the system should ideally never exceed a fraction of that upper limit. If that upper limit is ever reached, then the system will activate xidStopLimit mode protections. These protections will remain in force until VACUUM (typically autovacuum) manages to advance the oldest datfrozenxid in the cluster (by advancing that database's oldest relfrozenxid via an aggressive VACUUM).

The 2.1 billion XIDs distance invariant is a consequence of the fact that on-disk storage of transaction IDs in heap row headers uses a truncated 32-bit representation to save space (rather than the full 64-bit representation). Since all unfrozen transaction IDs from heap tuple headers must be from the same transaction ID epoch (which is what the invariant actually assures), there isn't any need to store a separate epoch field in each tuple header. The downside is that the system depends on freezing (and relfrozenxid advancement during aggressive VACUUMs) to make sure that the available supply of transaction IDs never exceeds the demand.

Note

In practice most tables require periodic aggressive vacuuming. However, some individual non-aggressive VACUUM operations may be able to advance relfrozenxid and/or relminmxid. This is most common in small, frequently modified tables, where VACUUM happens to scan all pages (or at least all pages not marked all-frozen in the visibility map) in the course of removing dead tuples.

VACUUM/autovacuum also use vacuum_multixact_freeze_table_age and autovacuum_multixact_freeze_max_age settings as independent Multixact ID orientated controls for aggressive mode VACUUM and anti-wraparound autovacuum. These work analogously to the XID-based vacuum_freeze_table_age and autovacuum_freeze_max_age, respectively. Note, however, that if the multixacts members storage area exceeds 2GB, then the effective value of autovacuum_multixact_freeze_max_age will be lower, resulting in more frequent aggressive mode VACUUMs.

There is only one major runtime behavioral differences between aggressive mode VACUUM and non-aggressive (standard) VACUUM. Both kinds of VACUUM use the visibility map to determine which pages of a table must be scanned, and which can be skipped. However, only non-aggressive VACUUM will skip pages that don't have any dead row versions even if those pages might still have row versions with old XID values; aggressive VACUUMs are limited to skipping pages already marked all-frozen (and all-visible).

As a consequence of all this, non-aggressive VACUUMs usually won't freeze every page with an old row version in the table. Most individual tables will eventually need an aggressive VACUUM, which will reliably freeze all pages with XID and MXID values older than vacuum_freeze_min_age, including those from all-visible but not all-frozen pages (and then update pg_class). vacuum_freeze_table_age controls when VACUUM must use its aggressive strategy. Since the setting is applied against age(relfrozenxid), settings like vacuum_freeze_min_age may influence the exact cadence of aggressive vacuuming. Setting vacuum_freeze_table_age to 0 forces VACUUM to always use its aggressive strategy.

Note

Aggressive VACUUMs apply the same rules for freezing as non-aggressive VACUUMs. You may nevertheless notice that aggressive VACUUMs perform a disproportionately large amount of the total required freezing in larger tables.

This is an indirect consequence of the fact that non-aggressive VACUUMs won't scan pages that are marked all-visible but not also marked all-frozen in the visibility map. VACUUM can only consider freezing those pages that it actually gets to scan.

Note in particular that vacuum_freeze_min_age isn't very likely to trigger freezing in non-aggressive VACUUMs, at least with default settings. The freeze on an FPI write mechanism is somewhat more likely to trigger in non-aggressive VACUUMs in practice, though. Much depends on workload characteristics.

To ensure that every table has its relfrozenxid advanced at somewhat regular intervals, including totally static tables, autovacuum is invoked on any table that might contain unfrozen rows with XIDs older than the age specified by the configuration parameter autovacuum_freeze_max_age. This will happen even if autovacuum is disabled.

In practice all anti-wraparound autovacuums will use VACUUM's aggressive strategy. This is assured because the effective value of vacuum_freeze_table_age is clamped to a value no greater than 95% of the current value of autovacuum_freeze_max_age. As a rule of thumb, vacuum_freeze_table_age should be set to a value somewhat below autovacuum_freeze_max_age, leaving enough gap so that a regularly scheduled VACUUM or an autovacuum triggered by inserts, updates and deletes is run in that window. Anti-wraparound autovacuums can be avoided altogether in tables that reliably receive some VACUUMs that use the aggressive strategy.

A convenient way to examine information about relfrozenxid and relminmxid is to execute queries such as:

SELECT c.oid::regclass as table_name,
greatest(age(c.relfrozenxid),
         age(t.relfrozenxid)) as xid_age,
mxid_age(c.relminmxid)
FROM pg_class c
LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
WHERE c.relkind IN ('r', 'm');

SELECT datname,
age(datfrozenxid) as xid_age,
mxid_age(datminmxid)
FROM pg_database;

The age column measures the number of transactions from the cutoff XID to the next unallocated transactions ID. The mxid_age column measures the number of MultiXactIds from the cutoff MultiXactId to the next unallocated multixact ID.

25.2.2.2. xidStopLimit mode #

If for some reason autovacuum utterly fails to advance any table's relfrozenxid or relminmxid for an extended period, and if XIDs and/or MultiXactIds continue to be allocated, the system will begin to emit warning messages like this when the database's oldest XIDs reach forty million transactions from the wraparound point:

WARNING:  database "mydb" must be vacuumed within 39985967 transactions
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that database.

(A manual VACUUM should fix the problem, as suggested by the hint; but note that the VACUUM must be performed by a superuser, else it will fail to process system catalogs and thus not be able to advance the database's datfrozenxid.) If these warnings are ignored, the system will eventually refuse to start any new transactions. This happens at the point that there are fewer than three million transactions left:

ERROR:  database is not accepting commands to avoid wraparound data loss in database "mydb"
HINT:  Stop the postmaster and vacuum that database in single-user mode.

The three-million-transaction safety margin exists to let the administrator recover without data loss, by manually executing the required VACUUM commands. However, since the system will not execute commands once it has gone into the safety shutdown mode, the only way to do this is to stop the server and start the server in single-user mode to execute VACUUM. The shutdown mode is not enforced in single-user mode. See the postgres reference page for details about using single-user mode.

Anything that influences when and how relfrozenxid and relminmxid advance will also directly affect the high watermark storage overhead from storing a great deal of historical transaction status information. The additional space overhead is usually of fairly minimal concern. It is noted as an additional downside of allowing the system to get close to xidStopLimit for the sake of completeness.

Historical Note

The term wraparound is inaccurate. Also, there is no data loss here — the message is simply wrong.

XXX: We really need to fix the situation with single user mode to put things on a good footing.

In emergencies, VACUUM will take extraordinary measures to avoid xidStopLimit mode. A failsafe mechanism is triggered when thresholds controlled by vacuum_failsafe_age and vacuum_multixact_failsafe_age are reached. The failsafe prioritizes advancing relfrozenxid and/or relminmxid as quickly as possible. Once the failsafe triggers, VACUUM bypasses all remaining non-essential maintenance tasks, and stops applying any cost-based delay that was in effect. Any Buffer Access Strategy in use will also be disabled.

25.2.3. Updating the Visibility Map #

Vacuum maintains a visibility map for each table to keep track of which pages contain only tuples that are known to be visible to all active transactions (and all future transactions, until the page is again modified). This has two purposes. First, vacuum itself can skip such pages on the next run, since there is nothing to clean up. Even VACUUMs that use the aggressive strategy can skip pages that are both all-visible and all-frozen (the visibility map keeps track of which pages are all-frozen separately).

Second, it allows PostgreSQL to answer some queries using only the index, without reference to the underlying table. Since PostgreSQL indexes don't contain tuple visibility information, a normal index scan fetches the heap tuple for each matching index entry, to check whether it should be seen by the current transaction. An index-only scan, on the other hand, checks the visibility map first. If it's known that all tuples on the page are visible, the heap fetch can be skipped. This is most useful on large data sets where the visibility map can prevent disk accesses. The visibility map is vastly smaller than the heap, so it can easily be cached even when the heap is very large.

25.2.4. Truncating transaction status information #

As noted in Section 25.2.2.2, anything that influences when and how relfrozenxid and relminmxid advance will also directly affect the high watermark storage overhead needed to store historical transaction status information. For example, increasing autovacuum_freeze_max_age (and vacuum_freeze_table_age along with it) will make the pg_xact and pg_commit_ts subdirectories of the database cluster take more space, because they store the commit status and (if track_commit_timestamp is enabled) timestamp of all transactions back to the datfrozenxid horizon (the earliest datfrozenxid in the entire cluster).

The commit status uses two bits per transaction. The default autovacuum_freeze_max_age setting of 200 million transactions translates to about 50MB of pg_xact storage. When track_commit_timestamp is enabled, about 2GB of pg_commit_ts storage will also be required.

MultiXactId status information is implemented as two separate SLRU storage areas: pg_multixact/members, and pg_multixact/offsets. There is no simple formula to determine the storage overhead per MultiXactId, since MultiXactIds have a variable number of member XIDs.

Truncating of transaction status information is only possible at the end of VACUUMs that advance relfrozenxid (in the case of pg_xact and pg_commit_ts) or relminmxid (in the case of (pg_multixact/members and pg_multixact/offsets) of whatever table happened to have the oldest value in the cluster when the VACUUM began. This typically happens very infrequently, often during aggressive strategy VACUUMs of one of the database's largest tables.

25.2.5. Updating Planner Statistics #

The PostgreSQL query planner relies on statistical information about the contents of tables in order to generate good plans for queries. These statistics are gathered by the ANALYZE command, which can be invoked by itself or as an optional step in VACUUM. It is important to have reasonably accurate statistics, otherwise poor choices of plans might degrade database performance.

The autovacuum daemon, if enabled, will automatically issue ANALYZE commands whenever the content of a table has changed sufficiently. However, administrators might prefer to rely on manually-scheduled ANALYZE operations, particularly if it is known that update activity on a table will not affect the statistics of interesting columns. The daemon schedules ANALYZE strictly as a function of the number of rows inserted or updated; it has no knowledge of whether that will lead to meaningful statistical changes.

Tuples changed in partitions and inheritance children do not trigger analyze on the parent table. If the parent table is empty or rarely changed, it may never be processed by autovacuum, and the statistics for the inheritance tree as a whole won't be collected. It is necessary to run ANALYZE on the parent table manually in order to keep the statistics up to date.

As with vacuuming for space recovery, frequent updates of statistics are more useful for heavily-updated tables than for seldom-updated ones. But even for a heavily-updated table, there might be no need for statistics updates if the statistical distribution of the data is not changing much. A simple rule of thumb is to think about how much the minimum and maximum values of the columns in the table change. For example, a timestamp column that contains the time of row update will have a constantly-increasing maximum value as rows are added and updated; such a column will probably need more frequent statistics updates than, say, a column containing URLs for pages accessed on a website. The URL column might receive changes just as often, but the statistical distribution of its values probably changes relatively slowly.

It is possible to run ANALYZE on specific tables and even just specific columns of a table, so the flexibility exists to update some statistics more frequently than others if your application requires it. In practice, however, it is usually best to just analyze the entire database, because it is a fast operation. ANALYZE uses a statistically random sampling of the rows of a table rather than reading every single row.

Tip

Although per-column tweaking of ANALYZE frequency might not be very productive, you might find it worthwhile to do per-column adjustment of the level of detail of the statistics collected by ANALYZE. Columns that are heavily used in WHERE clauses and have highly irregular data distributions might require a finer-grain data histogram than other columns. See ALTER TABLE SET STATISTICS, or change the database-wide default using the default_statistics_target configuration parameter.

Also, by default there is limited information available about the selectivity of functions. However, if you create a statistics object or an expression index that uses a function call, useful statistics will be gathered about the function, which can greatly improve query plans that use the expression index.

Tip

The autovacuum daemon does not issue ANALYZE commands for foreign tables, since it has no means of determining how often that might be useful. If your queries require statistics on foreign tables for proper planning, it's a good idea to run manually-managed ANALYZE commands on those tables on a suitable schedule.

Tip

The autovacuum daemon does not issue ANALYZE commands for partitioned tables. Inheritance parents will only be analyzed if the parent itself is changed - changes to child tables do not trigger autoanalyze on the parent table. If your queries require statistics on parent tables for proper planning, it is necessary to periodically run a manual ANALYZE on those tables to keep the statistics up to date.