Missing FSM Update when Updating VM On-access

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Andres Freund <andres(at)anarazel(dot)de>
Subject: Missing FSM Update when Updating VM On-access
Date: 2026-06-29 22:32:47
Message-ID: CAAKRu_b2StZrEC=HmW8LePuQbczyFRnfs8qTAJwn_=W76-y24w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

A few days ago, during a conversation about another patch, Andres and
I realized that my patch to set the VM on-access could lead to
out-of-date freespace maps. Pages set all-visible in the VM can be
skipped by vacuum, which would then not update the FSM for that page.

The solution to this is to update the FSM in on-access pruning if we
updated the VM.
There is, of course, the possibility of additional overhead here. The
needed freespace map page itself will often be cached, but there is
overhead to pinning it, and, if the figure has changed, to dirtying
it.

Dirtying it isn't a big concern because RecordPageWithFreeSpace() only
dirties the FSM when the freespace category actually changes -- in
which case it's worth it. And it's not WAL-logged, so there isn't much
to writing to it.

I concocted the worst case scenario I could come up with -- a relation
where every single page had to set the VM and needed a FSM update and
the query returns no rows and deforms no tuples. In this case, there
was a few percentage point slowdown due to the extra buffer pinning
and lock acquire/release. (Repro at end of email)

I tried to think of some heuristics so that we could limit when we did
the FSM pinning and locking, but none seemed very good. We could check
if the new amount of free space is bigger than an FSM category step
(32), but that doesn't help us if we are correcting an FSM
overestimation. This could happen because inserts don't update the FSM
until the inserting tuple doesn't fit on the target page.

I also thought of caching the pinned FSM page in the scan descriptor
like we do with the VM page. This doesn't work as nicely because each
FSM page covers fewer heap pages. Also, the pinning and unpinning all
happens inside of the FSM API functions.

Therefore, I think the best option is the simplest -- if we set the
page all-visible, also see if we should update the FSM.

Note that this fix is only needed for the primary -- when pruning set
the page all-visible on-access and emitted a WAL record for it, the
standby was already updating the FSM while replaying the prune record
(in heap_xlog_prune_freeze() -> XLogRecordPageWithFreeSpace()).

And, finally, we do not have to worry about vacuuming the FSM
on-access because vacuum will still do it for ranges of total pages --
regardless of what it skipped.

Proposed patch attached. This requires a backpatch to 19.

Repro:
CREATE TABLE t (id int, pad text) WITH (autovacuum_enabled=off);
-- 170000 is ~10,000 pages of 420-byte rows
INSERT INTO t SELECT g, repeat('x',420) FROM generate_series(1,170000);
VACUUM (FREEZE) t;
-- create one removable dead tuple per page by shrinking the first row per page
UPDATE t SET pad = repeat('y',5)
WHERE id IN (SELECT min(id) FROM t GROUP BY (ctid::text::point)[0]);
-- advance the xmin horizon (so previously created dead tuples are
removable on-access)
create table dummy (a int);
-- on-access prune during scan sets every page all-visible and updates FSM
SELECT 1 FROM t OFFSET 10000000;
-- this will show a different number before and after my patch
SELECT sum(avail) FROM pg_freespace('t');

- Melanie

Attachment Content-Type Size
v1-0001-Update-FSM-after-updating-VM-on-access.patch text/x-patch 2.7 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2026-06-29 22:46:13 Re: occasional ECPG failures on dikkop (FreeBSD)
Previous Message Masahiko Sawada 2026-06-29 21:55:04 Re: Optimize UUID parse using SIMD