| From: | Imran Zaheer <imran(dot)zhir(at)gmail(dot)com> |
|---|---|
| To: | Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com> |
| Cc: | exclusion(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
| Subject: | Re: BUG #19519: REPACK can fail due to missing chunk for toast value |
| Date: | 2026-07-03 06:45:42 |
| Message-ID: | CA+UBfan1QukfNDPvb2YpPG-ME4vUduBfbbe5uWptXDryZbtwVA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
Hi,
The proposed fix (skipping own xmin while computing oldest xmin) did
solve the problem, but while testing this, I found another bug report
showing the same error: "missing chunks."
For example, this report [1] reproduces the same bug with a different
repro attached, but interestingly, this thread discussion also pointed
out that the bug is also reproducible with the CREATE INDEX command.
The following trace was reported by Alexander Lakhin there.
> The backtrace of the heap_fetch_toast_slice() call emitting error is:
> 2024-03-05 17:41:07.786 UTC|law|db10|65e75933.208c6b|ERROR: missing chunk number 0 for toast value 17314 in pg_toast_17289
> 2024-03-05 17:41:07.786 UTC|law|db10|65e75933.208c6b|BACKTRACE:
> heap_fetch_toast_slice at heaptoast.c:784:3
> toast_fetch_datum at detoast.c:379:2
> detoast_external_attr at detoast.c:54:12
> index_form_tuple_context at indextuple.c:111:21
> tuplesort_putindextuplevalues at tuplesortvariants.c:678:15
> _bt_spool at nbtsort.c:537:1
> _bt_build_callback at nbtsort.c:610:24
> heapam_index_build_range_scan at heapam_handler.c:1329:22
> table_index_build_scan at tableam.h:1781:9
> (inlined by) _bt_spools_heapscan at nbtsort.c:483:15
> btbuild at nbtsort.c:329:14
> index_build at index.c:3042:10
> index_create at index.c:1265:3
> DefineIndex at indexcmds.c:1166:3
Index building, together with lazy vacuum, can create similar
conditions as we saw with VACUUM FULL / CLUSTER / REPACK. In
heapam_index_build_range_scan(), we compute the OldestXmin, which
later is used to decide which tuples to index (in repack we use it to
decide which tuples to copy). I think we should also consider fixing
the index building path while addressing this bug.
```
OldestXmin = GetOldestNonRemovableTransactionId(heapRelation);
```
Can we safely use the same fix in the index build path too? Can we use
GetOldestNonRemovableTransactionIdforRewrite or something similar
here? Normal serial index builds use SnapShotAny and concurrent index
builds use MVCC, but the bug only exists in the serial index build
path.
Other than that, after applying your patch, the bug was not
reproducible with either this repro or the other report's repro [1] in
the rewrite path. However, the create index bug is still there. You
can use the following repro as mentioned in the thread [1].
> I've discovered that not only VACUUM FULL can stumble over such missing
> toast values. CREATE INDEX behaves similarly, as the following script
> shows:
> for ((c=1;c<=20;c++)); do createdb db$c; done
>
>
> for ((i=1;i<=100;i++)); do
> echo "iteration $i"
> for ((c=1;c<=20;c++)); do
> echo "\\d sometable" | psql -d db$c >psql-1-$c.log 2>&1 &
> echo "DROP TABLE IF EXISTS tbl;
> CREATE TABLE tbl (i int, t text);
> ALTER TABLE tbl ALTER COLUMN t SET STORAGE EXTERNAL;
> INSERT INTO tbl(i, t) VALUES (1, repeat('1234567890', 250));
> DELETE FROM tbl;
>
>
> VACUUM (VERBOSE) tbl;
> CREATE INDEX t_idx ON tbl(t);
> " | psql -d db$c >psql-2-$c.log 2>&1 &
> done
> wait
> grep 'missing chunk number' server.log && break;
> done
Regards,
Imran Zaheer
[1]: https://www.postgresql.org/message-id/flat/18351-f6e06364b3a2e669%40postgresql.org
On Wed, Jun 17, 2026 at 10:03 PM Srinath Reddy Sadipiralla
<srinath2133(at)gmail(dot)com> wrote:
>
> Hi Alexander,
>
> On Sun, Jun 14, 2026 at 10:37 PM PG Bug reporting form <noreply(at)postgresql(dot)org> wrote:
>>
>> The following bug has been logged on the website:
>>
>> The following script:
>> createdb db1
>> createdb db2
>>
>> cat << EOF | psql db1
>> SET default_statistics_target = 1000;
>> CREATE TABLE t(i int, t text);
>> INSERT INTO t SELECT 1, g::text FROM generate_series(1, 50000) g;
>> ANALYZE t;
>> EOF
>>
>> cat << EOF | psql db2 &
>> BEGIN;
>> CREATE TABLE t(i int);
>> SELECT pg_sleep(3);
>> EOF
>>
>> sleep 1
>>
>> cat << EOF | psql db1
>> DROP TABLE t;
>> VACUUM pg_toast.pg_toast_2619;
>> REPACK;
>> EOF
>> wait
>>
>> triggers:
>> ERROR: missing chunk number 0 for toast value 16393 in pg_toast_2619
>
>
> Thanks for the report and clean reproducer , I can also reproduce this.
>
> I looked into this. It is not REPACK-specific, it reproduces with
> plain VACUUM FULL on back branches too, so it predates
> the REPACK command. AFAIU The root cause is a disagreement about
> the removal horizon between a table rewrite and an independent vacuum
> of that table's TOAST relation.
>
> A rewrite (cluster_rel -> copy_table_data, copy_heap_data) preserves live
> and RECENTLY_DEAD tuples, and detoasts a preserved tuple's external
> values to move them into the new TOAST table.Its removal cutoff comes
> from GetOldestNonRemovableTransactionId(), i.e. ComputeXidHorizons(),
> which includes the rewriting backend's own snapshot xmin.
> The rewrite holds an ordinary MVCC snapshot (taken so index expressions
> can be evaluated) and deliberately does not set PROC_IN_VACUUM. So
> The rewrite backend's snapshot xmin is the cluster-wide oldest
> running xid, which the db2 transaction pins. Since that backend is
> in db1 and lacks PROC_IN_VACUUM, its own xmin folds into the
> relation's data/catalog horizon -> OldestXmin sits behind the
> DROP, so the dead pg_statistic tuple is RECENTLY_DEAD and gets
> copied.
> The lazy vacuum of pg_toast_2619 sets PROC_IN_VACUUM, which excludes
> its own xmin from the horizon, and the db2 transaction, being in another database,
> does not constrain a non-shared relation's horizon. Its cutoff advances past the DROP
> and the chunks are removed.The two operations thus disagree about the same tuple,
> and the rewrite ends up fetching TOAST the lazy vacuum already reclaimed.
>
> Attached Fix:
> The rewrite is simply too conservative,AFAIK it's snapshot is only there to
> evaluate index expressions against other relations, and it is not a
> reader of the rewritten relation's historical rows, so its own xmin
> should not hold back that relation's removal horizon. Excluding it
> yields exactly the horizon the lazy vacuum uses.
> We cannot set PROC_IN_VACUUM on the rewrite, that would broadcast
> "ignore my xmin" to every backend and let other vacuums remove tuples in
> other relations that the index expressions may need (the documented
> reason VACUUM FULL does not set it). The exclusion has to be local to
> the rewrite's own horizon computation, so the patch adds
> GetOldestNonRemovableTransactionIdForRewrite(): ComputeXidHorizons()
> gains a flag to skip the calling backend (and to not publish the
> resulting, more-aggressive horizons into the shared GlobalVisState), and
> copy_table_data() uses it for OldestXmin.
>
> --
> Thanks :)
> Srinath Reddy Sadipiralla
> EDB: https://www.enterprisedb.com/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Thom Brown | 2026-07-03 08:36:09 | Re: EXPLAIN (VERBOSE) fails with for JSON_ARRAYAGG/JSON_OBJECTAGG + window function |
| Previous Message | David Rowley | 2026-07-03 03:38:34 | Re: BUG #19533: Wrong results from WindowAgg run-condition pushdown on count() with EXCLUDE CURRENT ROW |