TOAST table vacuum truncation parameter inheritance bug (?) in autovacuum

From: Shayon Mukherjee <shayonj(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: TOAST table vacuum truncation parameter inheritance bug (?) in autovacuum
Date: 2025-07-15 12:09:57
Message-ID: CANqtF-pLHBDdNM-DifXQqVq+VWA3DTYOx4+3m2+TFJ1zDOZETg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello hackers,

I'd like to discuss an issue I observed where TOAST tables do not properly
inherit `vacuum_truncate` settings from their parent tables during
autovacuum operations, leading to unexpected and prolonged truncation
phases that can cause significant lock contention on production systems. I
discussed a nature of this issue on a related topic here [1].

Based on the documentation [2], my understanding is that TOAST tables
should inherit the `vacuum_truncate` setting from their parent tables.
However, I've observed that this is not the case.

> If a table parameter value is set and the equivalent toast. parameter is
not, the TOAST table will use the table's parameter value. [2]

Recap of the production incident

We experienced this issue directly where an autovacuum on a TOAST table
took approximately 5 hours despite the parent table having
`vacuum_truncate=false` explicitly set. One important note here is that I
added the vacuum_truncate=false to the parent table _after_ it was created
via `ALTER TABLE ... SET (vacuum_truncate = false)`.

During the autovacuum, the TOAST table performed truncation anyway which
lead to locks on the reader and impacting queries basically. Example:

```
automatic vacuum of table "tines.pg_toast.pg_toast_27098": index scans: 1
pages: 12730516 removed, 421195 remain, 0 skipped due to pins, 12720236
skipped frozen
tuples: 4575083 removed, 26717710 remain, 21498 are dead but not yet
removable
removable cutoff: 173810717, which was 5 XIDs old when operation ended
new relfrozenxid: 173810722, which is 5 XIDs ahead of previous value
frozen: 150 pages from table (0.04% of total) had 26717710 tuples frozen
visibility map: 421045 pages set all-visible, 421045 pages set all-frozen
index scan needed: 151751 pages from table (36.06% of total) had 4629490
dead item identifiers removed
index "pg_toast_27098_index": pages: 303420 in total, 14038 newly deleted,
298668 currently deleted, 298418 reusable
avg read rate: 8.730 MB/s, avg write rate: 2.140 MB/s
buffer usage: 5132686 hits, 25714763 misses, 2140873 dirtied
WAL usage: 2845123 records, 1285947 full page images, 458923847 bytes
system usage: CPU: user: 327.79 s, system: 251.11 s, elapsed: 18722.21 s
```

After investigating the code, I discovered that TOAST tables do not
properly inherit vacuum_truncate settings from their parent tables during
autovacuum operations. The issue manifests in two scenarios per my
understanding and testing:

1. Manual VACUUM: The toast_vacuum_params structure is copied early in
vacuum_rel() before the main table processes its vacuum options, so TOAST
tables receive the default VACOPT_TERNARY_DEFAULT value rather than the
parent's final decision.

2. Autovacuum: TOAST tables are treated as completely separate vacuum
operations in the autovacuum launcher, so they never consult their parent
table's settings. They only use explicit toast.vacuum_truncate settings if
present on the parent table, but when no such explicit TOAST-specific
setting exists, they fall back to the global vacuum_truncate default rather
than inheriting the parent table's vacuum_truncate setting.

My understanding of how the inheritance should work (based on the
documentation) is that - if a parent table has toast.vacuum_truncate
explicitly set, that value is used correctly for the TOAST table. However,
when no toast.vacuum_truncate is specified, the TOAST table should inherit
the parent's vacuum_truncate setting. Currently, this second case fails
during autovacuum operations, causing TOAST tables to use the global
default instead of the parent's setting. Which then results in the lock
contention issue that comes with truncation that I mentioned above.

To validate this understanding, I added some debug logging to trace vacuum
truncate decisions and was able to reproduce the issue consistently. Here's
what the logs showed before the fix using the script I attached where I set
the parent table to have `vacuum_truncate=false` and create & delete some
dummy data while having the autovacuum trigger frequently:

```
LOG: VACUUM TRUNCATE DEBUG: relation "toast_truncate_test" has explicit
vacuum_truncate=false
LOG: VACUUM TRUNCATE DEBUG: relation "toast_truncate_test" final decision:
params.truncate=DISABLED
automatic vacuum: pages: 0 removed, 671 remain

LOG: VACUUM TRUNCATE DEBUG: relation "pg_toast_327681" using global default
vacuum_truncate=true
LOG: VACUUM TRUNCATE DEBUG: relation "pg_toast_327681" final decision:
params.truncate=ENABLED
automatic vacuum: pages: 92963 removed, 139458 remain
```

To see if this is the case, I implemented a very
POC(proof-of-concept)-level patch that would address both scenarios:

1. For manual VACUUM: Modified `vacuum_rel()` to pass the final truncation
decision to TOAST tables after the main table processes its options.

2. For autovacuum: Added logic in the vacuum truncate decision code to
detect when we're processing a TOAST table and look up the parent table's
`vacuum_truncate` setting when the TOAST table itself has no explicit
setting.

The key changes are in `src/backend/commands/vacuum.c`:

- Modified parameter passing to allow updating vacuum params after initial
setup
- Added parent table lookup logic using `pg_class.reltoastrelid`
- Ensured TOAST tables inherit parent settings when they have
`VACOPT_TERNARY_DEFAULT`

After applying the fix, the logs show correct inheritance:

```
LOG: VACUUM TRUNCATE DEBUG: relation "toast_truncate_test" final decision:
params.truncate=DISABLED
automatic vacuum: pages: 0 removed, 671 remain

LOG: VACUUM TRUNCATE DEBUG: relation "pg_toast_327681" inheriting
vacuum_truncate=false from parent OID 327681
LOG: VACUUM TRUNCATE DEBUG: relation "pg_toast_327681" final decision:
params.truncate=DISABLED
automatic vacuum: pages: 0 removed, 139458 remain
```

The patch isn't quite production ready I would say and I mostly used it as
a way to learn more about the codebase and the internals of the autovacuum
process. I have attached the patch and the script I used to test it.

What I am curious about is - does this report of the bug make sense and if
the patch is headed in the right direction? Or is the issue here with the
documentation and its actually intended behavior that autovacuum should not
inherit the parent table's vacuum_truncate setting when working on TOAST
tables?

My preference would be that it works the way described in the
documentation. And if that approach is acceptable, I'm particularly
interested in feedback on whether the parent table lookup logic is
appropriate for this use case, and if there are any edge cases I should
consider. I haven't yet paid close attention to the patch around
opening/closing tables and indexes, so that will be my next step if the
approach seems reasonable. Happy to iterate as we go.

Thanks for your time and consideration.

Shayon

[1]
https://www.postgresql.org/message-id/CANqtF-qDGYhYDcpg3PEeDrXMmuJZJGTAeT0mJx0KrN%2BkVikZig%40mail.gmail.com
[2]
https://www.postgresql.org/docs/current/sql-createtable.html#RELOPTION-VACUUM-TRUNCATE

Attachment Content-Type Size
v1-0001-Vacuum-truncate-inheritance-support.patch application/octet-stream 5.7 KB
toast_autovacuum_test_fixed.sql application/octet-stream 3.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shayon Mukherjee 2025-07-15 12:19:08 Re: [PATCH] Proposal to Enable/Disable Index using ALTER INDEX
Previous Message Amit Kapila 2025-07-15 11:31:17 Re: Logical replication prefetch