25.2. Autovacuum Maintenance Tasks #

25.2.1. Recovering Disk Space
25.2.2. Freezing to manage the transaction ID space
25.2.3. Updating the Visibility Map
25.2.4. Truncating transaction status information
25.2.5. Updating Planner Statistics

PostgreSQL databases require periodic maintenance known as vacuuming, and require periodic updates to the statistics used by the PostgreSQL query planner. These maintenance tasks are performed by the VACUUM and ANALYZE commands respectively. For most installations, it is sufficient to let the autovacuum daemon determine when to perform these maintenance tasks (which is partly determined by configurable table-level thresholds; see Section 25.1).

The autovacuum daemon has to process each table on a regular basis for several reasons:

  1. To recover or reuse disk space occupied by updated or deleted rows.
  2. To maintain the system's ability to allocated new transaction IDs (and new multixact IDs) through freezing.
  3. To update the visibility map, which speeds up index-only scans, and helps the next VACUUM operation avoid needlessly scanning pages that are already frozen
  4. To truncate obsolescent transaction status information, when possible
  5. To update data statistics used by the PostgreSQL query planner.

Maintenance work within the scope of items 1, 2, 3, and 4 is performed by the VACUUM command internally. Item 5 (maintenance of planner statistics) is handled by the ANALYZE command internally. Although this section presents information about autovacuum, there is no difference between manually-issued VACUUM and ANALYZE commands and those run by the autovacuum daemon (though there are autovacuum-specific variants of a small number of settings that control VACUUM).

Autovacuum creates a substantial amount of I/O traffic, which can cause poor performance for other active sessions. There are configuration parameters that can be adjusted to reduce the performance impact of background vacuuming. See the autovacuum-specific cost delay settings described in Section 20.10, and additional cost delay settings described in Section 20.4.4.

Some database administrators will want to supplement the daemon's activities with manually-managed VACUUM commands, which typically are executed according to a schedule by cron or Task Scheduler scripts. It can be useful to perform off-hours VACUUM commands during periods where reduced load is expected.

25.2.1. Recovering Disk Space #

In PostgreSQL, an UPDATE or DELETE of a row does not immediately remove the old version of the row. This approach is necessary to gain the benefits of multiversion concurrency control (MVCC, see Chapter 13): the row version must not be deleted while it is still potentially visible to other transactions. A deleted row version (whether from an UPDATE or DELETE) will usually cease to be of interest to any still running transaction shortly after the original deleting transaction commits.

The space dead tuples occupy must eventually be reclaimed for reuse by new rows, to avoid unbounded growth of disk space requirements. Reclaiming space from dead rows is VACUUM's main responsibility.

The XID cutoff point that VACUUM uses to determine whether or not a deleted tuple is safe to physically remove is reported under removable cutoff in the server log when autovacuum logging (controlled by log_autovacuum_min_duration) reports on a VACUUM operation executed by autovacuum. Tuples that are not yet safe to remove are counted as dead but not yet removable tuples in the log report. VACUUM establishes its removable cutoff once, at the start of the operation. Any older snapshot (or transaction that allocates an XID) that's still running when the cutoff is established may hold it back.

Tip

It's important that no long running transactions ever be allowed to hold back every VACUUM operation's cutoff for an extended period. You may wish to monitor this.

Note

Tuples inserted by aborted transactions can be removed by VACUUM immediately

VACUUM will not return space to the operating system, except in the special case where a group of contiguous pages at the end of a table become entirely free and an exclusive table lock can be easily obtained. This relation truncation behavior can be disabled in tables where the exclusive lock is disruptive by setting the table's vacuum_truncate storage parameter to off.

Tip

If you have a table whose entire contents are deleted on a periodic basis, consider doing it with TRUNCATE rather than relying on VACUUM. TRUNCATE removes the entire contents of the table immediately, avoiding the need to set xmax to the deleting transaction's XID. One disadvantage is that strict MVCC semantics are violated.

Tip

VACUUM FULL or CLUSTER can be useful when dealing with extreme amounts of dead tuples. It can reclaim more disk space, but runs much more slowly. It rewrites an entire new copy of the table and rebuilds all indexes. This typically has much higher overhead than VACUUM. Generally, therefore, administrators should avoid using VACUUM FULL except in the most extreme cases.

Note

Although VACUUM FULL is technically an option of the VACUUM command, VACUUM FULL uses a completely different implementation. VACUUM FULL is essentially a variant of CLUSTER. (The name VACUUM FULL is historical; the original implementation was somewhat closer to standard VACUUM.)

Warning

TRUNCATE, VACUUM FULL, and CLUSTER all require an ACCESS EXCLUSIVE lock, which can be highly disruptive (SELECT, INSERT, UPDATE, and DELETE commands will all be blocked).

Warning

VACUUM FULL (and CLUSTER) temporarily uses extra disk space approximately equal to the size of the table, since the old copies of the table and indexes can't be released until the new ones are complete.

25.2.2. Freezing to manage the transaction ID space #

PostgreSQL's MVCC transaction semantics depend on being able to compare transaction ID numbers (XID) to determine whether or not the row is visible to each query's MVCC snapshot (see interpreting XID stamps from tuple headers). But since on-disk storage of transaction IDs in heap pages uses a truncated 32-bit representation to save space (rather than the full 64-bit representation), it is necessary to vacuum every table in every database at least once every two billion transactions (though far more frequent vacuuming is typical).

VACUUM marks pages as frozen, indicating that all eligible rows on the page were inserted by a transaction that committed sufficiently far in the past that the effects of the inserting transaction are certain to be visible to all current and future transactions. Frozen rows are treated as if the inserting transaction (the heap tuple's xmin XID) committed infinitely far in the past, making the effects of the transaction visible to all current and future MVCC snapshots, forever. Once frozen, pages are self contained in the sense that all of its rows can be read without ever having to consult externally stored transaction status metadata (for example, transaction commit status information from pg_xact is not required).

Without freezing, the system will eventually refuse to allocate new transaction IDs, making DML statements throw errors until such time as VACUUM can restore the system's ability to allocate new transaction IDs. The system is unable to recognize distances of more than about 2.1 billion transactions between any two unfrozen XIDs. The only safe option that remains is to disallow allocating new XIDs until VACUUM has performed the steps required to make sure that the distance invariant is never violated.

vacuum_freeze_min_age can be used to control when freezing takes place. When VACUUM scans a heap page containing even one XID that already attained an age exceeding this value, the page is frozen. All tuples from the page are frozen, making the page suitable for long-term storage (their is no longer any possible need to interpret the contents of the page using external transaction status information). Increasing this setting may avoid unnecessary work if the pages that would otherwise be frozen will soon be modified again, but decreasing this setting makes it more likely that some future VACUUM operation will need to perform an excessive amount of catch-up freezing, all in one go.

Multixact IDs are used to support row locking by multiple transactions. Since there is only limited space in a tuple header to store lock information, that information is encoded as a multiple transaction ID, or multixact ID for short, whenever there is more than one transaction concurrently locking a row. Information about which transaction IDs are included in any particular multixact ID is stored separately, and only the multixact ID appears in the xmax field in the tuple header. Like transaction IDs, multixact IDs are implemented as a 32-bit counter and corresponding storage. Freezing of multixact IDs is also required, though this actually means setting xmax to InvalidTransactionId. (Actually, any xmax field is processed in the same way when its page is frozen, including xmax fields that contain simple XIDs.)

vacuum_multixact_freeze_min_age can also influence which pages VACUUM freezes. Like vacuum_freeze_min_age, the setting triggers page-level freezing. However, vacuum_multixact_freeze_min_age uses MultiXactIds (not XIDs) to measure age.

Note

In PostgreSQL versions before 16, freezing was triggered at the level of individual xmin and xmax fields. The decision to freeze (or not freeze) is now made at the level of whole heap pages, at the point that they are scanned by VACUUM.

In general, the major cost of freezing is the additional WAL volume. That's why VACUUM doesn't just freeze every eligible tuple at the earliest opportunity: freezing will go to waste in cases where a recently frozen tuple soon goes on to be deleted anyway. Managing the added WAL volume from freezing over time is an important consideration for VACUUM.

VACUUM also triggers freezing of pages in cases where it already proved necessary to write out an FPI to the WAL as torn page protection (as part of removing dead tuples). The extra WAL volume from proactive freezing is insignificant compared to the cost of the FPI. It is very likely (though not quite certain) that the overall volume of WAL will be lower in the long term with tables that have most freezing triggered by the FPI mechanism, since (at least on average) future VACUUMs shouldn't have to write a second FPI out much later on, when freezing becomes strictly necessary. The FPI freezing mechanism is just an alternative trigger criteria for freezing all eligible tuples on the page. In general, VACUUM freezes pages without regard to the condition that triggered freezing. The physical modifications to the page (how tuples are processed) is decoupled from the mechanism that decides if those modifications should happen at all.

VACUUM may not be able to freeze every tuple's xmin in relatively rare cases. The criteria that determines eligibility for freezing is exactly the same as the one that determines if a deleted tuple should be considered removable or merely dead but not yet removable (namely, the XID-based removable cutoff).

25.2.2.1. VACUUM's aggressive strategy #

To track the age of the oldest unfrozen XIDs in a database, VACUUM stores XID statistics in the system tables pg_class and pg_database. In particular, the relfrozenxid column of a table's pg_class row contains the oldest remaining unfrozen XID at the end of the most recent VACUUM that successfully advanced relfrozenxid. Similarly, the datfrozenxid column of a database's pg_database row is a lower bound on the unfrozen XIDs appearing in that database — it is just the minimum of the per-table relfrozenxid values within the database.

VACUUM uses the visibility map to determine which pages of a table must be scanned. Normally, it will skip pages that don't have any dead row versions even if those pages might still have row versions with old XID values. Therefore, normal VACUUMs usually won't freeze every page with an old row version in the table. Most individual tables will eventually need an aggressive vacuum, which will reliably freeze all pages with XID and MXID values older than vacuum_freeze_min_age, including those from all-visible but not all-frozen pages. vacuum_freeze_table_age controls when VACUUM must use its aggressive strategy. Since the setting is applied against age(relfrozenxid), settings like vacuum_freeze_min_age may influence the exact cadence of aggressive vacuuming. Setting vacuum_freeze_table_age to 0 forces VACUUM to always use its aggressive strategy.

Note

In practice most tables require periodic aggressive vacuuming. However, some individual non-aggressive VACUUM operations may be able to advance relfrozenxid and/or relminmxid. This is most common in small, frequently modified tables, where VACUUM happens to scan all pages (or at least all pages not marked all-frozen) in the course of removing dead tuples.

The maximum time that a table can go unvacuumed is about two billion transactions. If it were to go unvacuumed for longer than that, the system will refuse to allocate new transaction IDs, temporarily rendering the database read-only. To ensure that this never happens, autovacuum is invoked on any table that might contain unfrozen rows with XIDs older than the age specified by the configuration parameter autovacuum_freeze_max_age. (This will happen even if autovacuum is disabled.)

In practice all anti-wraparound autovacuums will use VACUUM's aggressive strategy. This is assured because the effective value of vacuum_freeze_table_age is clamped to a value no greater than 95% of the current value of autovacuum_freeze_max_age. As a rule of thumb, vacuum_freeze_table_age should be set to a value somewhat below autovacuum_freeze_max_age, leaving enough gap so that a regularly scheduled VACUUM or an autovacuum triggered by inserts, updates and deletes is run in that window. Anti-wraparound autovacuums can be avoided altogether in tables that reliably receive some VACUUMs that use the aggressive strategy.

Note

There is no fundamental difference between a VACUUM run during anti-wraparound autovacuuming and a VACUUM that happens to use the aggressive strategy (whether run by autovacuum or manually issued).

A convenient way to examine information about relfrozenxid and relminmxid is to execute queries such as:

SELECT c.oid::regclass as table_name,
greatest(age(c.relfrozenxid),
         age(t.relfrozenxid)) as xid_age,
mxid_age(c.relminmxid)
FROM pg_class c
LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
WHERE c.relkind IN ('r', 'm');

SELECT datname,
age(datfrozenxid) as xid_age,
mxid_age(datminmxid)
FROM pg_database;

The age column measures the number of transactions from the cutoff XID to the next unallocated transactions XID. The mxid_age column measures the number of MultiXactIds from the cutoff MultiXactId to the next unallocated MultiXactId.

Tip

When the VACUUM command's VERBOSE parameter is specified, VACUUM prints various statistics about the table. This includes information about how relfrozenxid and relminmxid advanced, and the number of newly frozen pages. The same details appear in the server log when autovacuum logging (controlled by log_autovacuum_min_duration) reports on a VACUUM operation executed by autovacuum.

25.2.2.2. xidStopLimit mode #

If for some reason autovacuum utterly fails to advance any table's relfrozenxid or relminmxid, the system will begin to emit warning messages like this when the database's oldest XIDs reach forty million transactions from the wraparound point:

WARNING:  database "mydb" must be vacuumed within 39985967 transactions
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that database.

(A manual VACUUM should fix the problem, as suggested by the hint; but note that the VACUUM must be performed by a superuser, else it will fail to process system catalogs and thus not be able to advance the database's datfrozenxid.) If these warnings are ignored, the system will eventually refuse to start any new transactions. This happens at the point that there are fewer than three million transactions left:

ERROR:  database is not accepting commands to avoid wraparound data loss in database "mydb"
HINT:  Stop the postmaster and vacuum that database in single-user mode.

The three-million-transaction safety margin exists to let the administrator recover without data loss, by manually executing the required VACUUM commands. However, since the system will not execute commands once it has gone into the safety shutdown mode, the only way to do this is to stop the server and start the server in single-user mode to execute VACUUM. The shutdown mode is not enforced in single-user mode. See the postgres reference page for details about using single-user mode.

Anything that influences when and how relfrozenxid and relminmxid advance will also directly affect the high watermark storage overhead from storing a great deal of historical transaction status information. The additional space overhead is usually of fairly minimal concern. It is noted as an additional downside of allowing the system to get close to xidStopLimit for the sake of completeness.

Historical Note

The term wraparound is inaccurate. Also, there is no data loss here — the message is simply wrong.

XXX: We really need to fix the situation with single user mode to put things on a good footing.

In emergencies, VACUUM will take extraordinary measures to avoid xidStopLimit mode. A failsafe mechanism is triggered when thresholds controlled by vacuum_failsafe_age and vacuum_multixact_failsafe_age are reached. The failsafe prioritizes advancing relfrozenxid and/or relminmxid as quickly as possible. Once the failsafe triggers, VACUUM bypasses all remaining non-essential maintenance tasks, and stops applying any cost-based delay that was in effect. Any Buffer Access Strategy in use will also be disabled.

25.2.3. Updating the Visibility Map #

Vacuum maintains a visibility map for each table to keep track of which pages contain only tuples that are known to be visible to all active transactions (and all future transactions, until the page is again modified). This has two purposes. First, vacuum itself can skip such pages on the next run, since there is nothing to clean up. Even VACUUMs that use the aggressive strategy can skip pages that are both all-visible and all-frozen (the visibility map keeps track of which pages are all-frozen separately).

Second, it allows PostgreSQL to answer some queries using only the index, without reference to the underlying table. Since PostgreSQL indexes don't contain tuple visibility information, a normal index scan fetches the heap tuple for each matching index entry, to check whether it should be seen by the current transaction. An index-only scan, on the other hand, checks the visibility map first. If it's known that all tuples on the page are visible, the heap fetch can be skipped. This is most useful on large data sets where the visibility map can prevent disk accesses. The visibility map is vastly smaller than the heap, so it can easily be cached even when the heap is very large.

25.2.4. Truncating transaction status information #

As noted in Section 25.2.2.2, anything that influences when and how relfrozenxid and relminmxid advance will also directly affect the high watermark storage overhead needed to store historical transaction status information. For example, increasing autovacuum_freeze_max_age (and vacuum_freeze_table_age along with it) will make the pg_xact and pg_commit_ts subdirectories of the database cluster take more space, because they store the commit status and (if track_commit_timestamp is enabled) timestamp of all transactions back to the datfrozenxid horizon (the earliest datfrozenxid in the entire cluster).

The commit status uses two bits per transaction. The default autovacuum_freeze_max_age setting of 200 million transactions translates to about 50MB of pg_xact storage. When track_commit_timestamp is enabled, about 2GB of pg_commit_ts storage will also be required.

MultiXactId status information is implemented as two separate SLRU storage areas: pg_multixact/members, and pg_multixact/offsets. There is no simple formula to determine the storage overhead per MultiXactId, since MultiXactIds have a variable number of member XIDs.

Truncating of transaction status information is only possible at the end of VACUUMs that advance relfrozenxid (in the case of pg_xact and pg_commit_ts) or relminmxid (in the case of (pg_multixact/members and pg_multixact/offsets) of whatever table happened to have the oldest value in the cluster when the VACUUM began. This typically happens very infrequently, often during aggressive strategy VACUUMs of one of the database's largest tables.

25.2.5. Updating Planner Statistics #

The PostgreSQL query planner relies on statistical information about the contents of tables in order to generate good plans for queries. These statistics are gathered by the ANALYZE command, which can be invoked by itself or as an optional step in VACUUM. It is important to have reasonably accurate statistics, otherwise poor choices of plans might degrade database performance.

The autovacuum daemon, if enabled, will automatically issue ANALYZE commands whenever the content of a table has changed sufficiently. However, administrators might prefer to rely on manually-scheduled ANALYZE operations, particularly if it is known that update activity on a table will not affect the statistics of interesting columns. The daemon schedules ANALYZE strictly as a function of the number of rows inserted or updated; it has no knowledge of whether that will lead to meaningful statistical changes.

Tuples changed in partitions and inheritance children do not trigger analyze on the parent table. If the parent table is empty or rarely changed, it may never be processed by autovacuum, and the statistics for the inheritance tree as a whole won't be collected. It is necessary to run ANALYZE on the parent table manually in order to keep the statistics up to date.

As with vacuuming for space recovery, frequent updates of statistics are more useful for heavily-updated tables than for seldom-updated ones. But even for a heavily-updated table, there might be no need for statistics updates if the statistical distribution of the data is not changing much. A simple rule of thumb is to think about how much the minimum and maximum values of the columns in the table change. For example, a timestamp column that contains the time of row update will have a constantly-increasing maximum value as rows are added and updated; such a column will probably need more frequent statistics updates than, say, a column containing URLs for pages accessed on a website. The URL column might receive changes just as often, but the statistical distribution of its values probably changes relatively slowly.

It is possible to run ANALYZE on specific tables and even just specific columns of a table, so the flexibility exists to update some statistics more frequently than others if your application requires it. In practice, however, it is usually best to just analyze the entire database, because it is a fast operation. ANALYZE uses a statistically random sampling of the rows of a table rather than reading every single row.

Tip

Although per-column tweaking of ANALYZE frequency might not be very productive, you might find it worthwhile to do per-column adjustment of the level of detail of the statistics collected by ANALYZE. Columns that are heavily used in WHERE clauses and have highly irregular data distributions might require a finer-grain data histogram than other columns. See ALTER TABLE SET STATISTICS, or change the database-wide default using the default_statistics_target configuration parameter.

Also, by default there is limited information available about the selectivity of functions. However, if you create a statistics object or an expression index that uses a function call, useful statistics will be gathered about the function, which can greatly improve query plans that use the expression index.

Tip

The autovacuum daemon does not issue ANALYZE commands for foreign tables, since it has no means of determining how often that might be useful. If your queries require statistics on foreign tables for proper planning, it's a good idea to run manually-managed ANALYZE commands on those tables on a suitable schedule.

Tip

The autovacuum daemon does not issue ANALYZE commands for partitioned tables. Inheritance parents will only be analyzed if the parent itself is changed - changes to child tables do not trigger autoanalyze on the parent table. If your queries require statistics on parent tables for proper planning, it is necessary to periodically run a manual ANALYZE on those tables to keep the statistics up to date.