VACUUM FULL or CREATE INDEX fails with error: missing chunk number 0 for toast value XXX

From: Тестова Екатерина <e(dot)testova(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: VACUUM FULL or CREATE INDEX fails with error: missing chunk number 0 for toast value XXX
Date: 2026-05-18 04:20:55
Message-ID: 1779078055.630771314@fmail2.qdit
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Bug ERROR: missing chunk number 0 for toast value.
The bug #18351 was previously reported in https://www.postgresql.org/message-id/flat/18351-f6e06364b3a2e669%40postgresql.org  but not resolved.
I have made reproducing easier, figured out the cause of the bug, and developed
a prototype patch, though it has known issues I'd like feedback on.
 
Reproduction
============
Tested on PostgreSQL 17.10
  1) Terminal 1:
     psql -d postgres
  2) Terminal 1:
     DROP TABLE IF EXISTS tbl;
     CREATE TABLE tbl (i int, t text);
     CREATE INDEX ON tbl (i);
     ALTER TABLE tbl ALTER COLUMN t SET STORAGE EXTERNAL;
     INSERT INTO tbl(i, t) VALUES (1, repeat('1234567890', 250));
  3) Terminal 2:
     psql -d postgres
  4) Terminal 2:
     BEGIN;
     SELECT txid_current();
  5) Terminal 3:
     createdb d1
  6) Terminal 3:
     psql -d d1
  7) Terminal 3:
     BEGIN;
     SELECT txid_current();
  8) Terminal 1:
     DELETE FROM tbl WHERE i = 1;
  9) Attach gdb to the backend from terminal 1
  10) Set breakpoint at vacuum_rel(toast_relid, NULL, &toast_vacuum_params,
      bstrategy); (line 2300 in src/backend/commands/vacuum.c)
  11) Terminal 1:
      VACUUM(VERBOSE) tbl;
  12) Wait for the breakpoint to be hit
  13) Terminal 2:
      COMMIT; (or just \q)
  14) Detach the process in gdb
  15) Terminal 1:
      CREATE INDEX ON tbl(t);
 
This should produce: ERROR: missing chunk number 0 for toast value …
 
The bug stems from different horizons in VACUUM table and its TOAST.
Two key mechanisms are involved:
1) Horizon computation (ComputeXidHorizons, called by
   GetOldestNonRemovableTransactionId): iterates over processes in the
   procarray, but skips those with PROC_IN_VACUUM set, and only considers
   processes in the same database selecting the minimum for the
   data_oldest_nonremovable. This value also feeds into
   GlobalVisState->definitely_needed (which can only grow).
2) Snapshot computation (GetSnapshotData): also skips PROC_IN_VACUUM
   processes, but does NOT filter by database — transactions in all databases
   contribute to the snapshot's xmin.
 
During the main table's VACUUM, a transaction in the same database holds the
data horizon up, so the tuple survives (it is RECENTLY_DEAD) — but by the
time we vacuum the TOAST table, that transaction has committed. The TOAST
tuples get removed, because with no other active processes in this database
OldestXmin become max computed (that is >xmax).
Later, CREATE INDEX on the main table computes its own OldestXmin. Our process
is no longer in VACUUM, so its xmin (carried over from the snapshot - minimum
txid obtained from backend in another database) is now considered. This xmin
is less than the dead tuple's xmax, so HeapTupleSatisfiesVacuum classifies
it as RECENTLY_DEAD rather than DEAD. CREATE INDEX tries to fetch the TOAST
value — but it's already gone.

Prototype patch
===============
The core idea: when vacuuming a TOAST table, reuse the OldestXmin that was
computed for the parent table, rather than computing a fresh one that
may have advanced past it.
 
The prototype patch adds two fields to VacuumParams:
  - cached_parent_oldest_xmin: stores the OldestXmin from the parent table
  - cached_parent_cutoffs_valid: indicates the cached value is usable
 
In heap_vacuum_rel(), if we're vacuuming a TOAST table and the parent's
OldestXmin is available, we use it instead of calling
GetOldestNonRemovableTransactionId() again. This prevents the TOAST vacuum
from removing tuples (based on OldestXmin and definitely_needed) that the
main table's vacuum considered still visible.
 
Known issues
============
Make check fails. One of the problems is cutoff for removing and freezing
tuples is far in the past. This causes assertion failures and incorrect
freezing behavior.
 
Alternative approach
====================
An alternative would be to add a definitely_needed check alongside
OldestXmin in create index, so that a tuple classified as RECENTLY_DEAD
by OldestXmin would be reclassified as DEAD if definitely_needed says it's safe
to remove. However, this adds an extra visibility check during CREATE INDEX,
which could cause a performance regression.
I'd appreciate comments on whether the "cache parent OldestXmin for TOAST
vacuum" approach worth pursuing, despite the freezing complications?
Or is there a cleaner way?
 
Best regards,
Ekaterina Testova, Postgres Professional

Attachment Content-Type Size
caching_oldest_xmin.patch text/x-diff 3.6 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2026-05-18 04:49:08 Re: Logical Replication - revisit `is_table_publication` function implementation
Previous Message Michael Paquier 2026-05-18 04:19:02 Re: Fix small issues of pg_restore_extended_stats()