> On Jan 14, 2026, at 16:38, Robert Haas <robertmhaas@gmail.com> wrote:
> 
> On Wed, Jan 7, 2026 at 9:50 AM Oleg Tkachenko <oatkachenko@gmail.com> wrote:
>> Both forks have the same limit, which looks wrong.
>> So I checked the WAL files to see what really happened with the VM fork.
>> I did not find any “truncate" records for the VM file.
>> I only found this record for the main fork
>> (actually, the fork isn’t mentioned at all):
>> 
>> rmgr: Storage  len (rec/tot): 46/46, tx: 759, lsn: 0/4600D318,
>> prev 0/4600B2C8, desc: TRUNCATE base/5/16384 to 131073 blocks flags 7
> 
> Flags 7 for Storage/TRUNCATE means all forks:
> 
> #define SMGR_TRUNCATE_HEAP              0x0001
> #define SMGR_TRUNCATE_VM                0x0002
> #define SMGR_TRUNCATE_FSM               0x0004
> #define SMGR_TRUNCATE_ALL               \
>        (SMGR_TRUNCATE_HEAP|SMGR_TRUNCATE_VM|SMGR_TRUNCATE_FSM)
> 
> I think this comes from RelationTruncate(), which does indeed set
> xlrec.flags = SMGR_TRUNCATE_ALL. It seems bananas to me to use the
> same count of blocks for all forks, but it seems that is the way the
> code treats it. SmgrTruncate() goes on to do
> smgrtruncate(RelationGetSmgr(rel), forks, nforks, old_blocks, blocks)
> which iterates over all forks and uses the same block number for all
> of them, smgr_redo() also does this, and SummarizeSmgrRecord() also
> calls BlockRefTableSetLimitBlock() for each relevant fork with that
> same block number. This really makes no sense to me unless the block
> count happens to be zero, but AFAICT all the code agrees that this is
> how it's supposed to work.
> 
> I think the problem here is that the incremental backup code makes the
> apparently-naive assumption that the purpose of truncation is to make
> things shorter. In this case, all forks were truncated to a random
> length that was well in excess of the length of the VM fork, and in
> pg_combinebackup, find_reconstructed_block_length() interprets that to
> mean that the output file should be at least as long as the truncation
> length. I am at present uncertain whether that can be safely changed
> without breaking anything else. I don't think that what we're doing is
> unsafe in the sense of producing corrupted data, because a bunch of
> trailing blocks of zeroes are harmless, but it's obviously potentially
> pretty problematic if it causes a huge disk space blowup as it did
> here. So I think something should be done about this, but I think the
> original issue you reported is more urgent.
> 
> So my suggestion is to change the test so that it produces a file that
> is the same small size on every platform. On most platforms, this will
> be 1 segment. On the CI platform where we set the segment size to 6,
> it will be multiple segments, and on that platform only it will
> effectively test for this bug. If you do that, then we can commit the
> fix for the original problem. We (or someone else) can then look into
> what needs to address the excessive zero-padding as a separate issue.
> 
> -- 
> Robert Haas
> EDB: http://www.enterprisedb.com