From 205e050b87870450084a55b4f42629ef27668444 Mon Sep 17 00:00:00 2001
From: Alexey Makhmutov <a.makhmutov@postgrespro.ru>
Date: Mon, 16 Mar 2026 13:32:45 +0300
Subject: [PATCH] Mark modified FSM buffer as dirty during recovery.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The XLogRecordPageWithFreeSpace function updates freespace map (FSM)
data while replaying data-level WAL records during the recovery. If FSM
block is updated, then it need to be marked as modified and currently
this task is performed using MarkBufferDirtyHint call (as in all other
cases for modifying of FSM data). However, in recovery context this
function will actually do nothing if checksums are enabled. It’s assumed
that page should not be dirtied during recovery while modifying hints to
protect from torn pages as no new WAL data could be generated at this
point to store FPI.

Such logic seems to be not fully aligned with the FSM case, as its
blocks could be just zeroed if checksum mismatch is detected. Currently
changes to a FSM block could be lost if each change to the particular
FSM block occurs rarely enough to allow its eviction from the cache.
To persist the change the modification need to be performed while FSM
block is still kept in buffers and marked as dirty after receiving its
FPI. If block was already cleaned, then the change won’t be persisted,
so stored FSM blocks may remain in an obsolete state.

If large number of discrepancies between the data in leaf FSM blocks and
actual data blocks is accumulated on the replica server side, then this
could cause significant delays for insert operations after switchover.
Such insert operation may need to visit many data blocks marked as
having enough space in FSM only to discover that this information is
incorrect and FSM records need to be fixed. In a heavily trafficked
insert-only table with many concurrent clients performing inserts this
has been observed to cause several second stalls, causing visible
application malfunction. The desire to avoid such cases was the reason
behind the commit ab7dbd681, which introduced an update of FSM data
during the heap_xlog_visible invocation. However, an update to the FSM
data on the standby side could be lost due to missing 'dirty' flag, so
there is still a possibility that a large number of FSM records will
contain incorrect data. Note, that having a zeroed FSM page in such case
(as result of checksum mismatch) is more preferable, as zero value will
be interpreted as indication of full data blocks and inserter will be
just routed to the next FSM block or to the end of the table.

Given that FSM is ready to handle torn page writes and
XLogRecordPageWithFreeSpace is called only during the recovery, there
seems to be no reason to use MarkBufferDirtyHint here instead of a
regular MarkBufferDirty call.
---
 src/backend/storage/freespace/freespace.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index 40d67a96178..1631bc79872 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -231,8 +231,18 @@ XLogRecordPageWithFreeSpace(RelFileLocator rlocator, BlockNumber heapBlk,
 	if (PageIsNew(page))
 		PageInit(page, BLCKSZ, 0);
 
+	/*
+	 * Changes to FSM are usually marked as changed using MarkBufferDirtyHint,
+	 * however during recovery it does nothing if checksums are enabled. It is
+	 * assumed that page should not be dirtied during recovery while modifying
+	 * hints to protect from torn pages as no new WAL data could be generated
+	 * at this point to store FPI. This is not relevant for the FSM case, as
+	 * its blocks are just zeroed in case of checksum mismatch. So, we need to
+	 * use regular MarkBufferDirty here to actually mark FSM block as modified
+	 * during the recovery, otherwise changes to the FSM may be just lost.
+	 */
 	if (fsm_set_avail(page, slot, new_cat))
-		MarkBufferDirtyHint(buf, false);
+		MarkBufferDirty(buf);
 	UnlockReleaseBuffer(buf);
 }
 
-- 
2.53.0

