Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Date: 2025-09-08 22:28:46
Message-ID: CAAKRu_Y7X=0UAQa5b_2Z20z5+UPBtDbjazYD9228jmj-d9NpQA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 8, 2025 at 4:15 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> Reviewing 0003:
>
> Locking a buffer in a critical section violates the order of
> operations proposed in the 'Write-Ahead Log Coding' section of
> src/backend/access/transam/README.

Right, I noticed some other callers of visibiltymap_set() (like
lazy_scan_new_or_empty()) did call it in a critical section (and it
exclusive locks the VM page), so I thought perhaps it was better to
keep this operation as close as possible to where we update the VM
(similar to how it is in master in visibilitymap_set()).

But, I think you're right that maintaining the order of operations
proposed in transam/README is more important. As such, in attached
v11, I've modified this patch and the other patches where I replace
visibilitymap_set() with visibilitymap_set_vmbits() to exclusively
lock the vmbuffer before the critical section.
visibilitymap_set_vmbits() asserts that we have the vmbuffer
exclusively locked, so we should be good.

> + * Now read and update the VM block. Even if we skipped
> updating the heap
> + * page due to the file being dropped or truncated later in
> recovery, it's
> + * still safe to update the visibility map. Any WAL record that clears
> + * the visibility map bit does so before checking the page LSN, so any
> + * bits that need to be cleared will still be cleared.
> + *
> + * It is only okay to set the VM bits without holding the heap page lock
> + * because we can expect no other writers of this page.
>
> The first paragraph of this paraphrases a similar content in
> xlog_heap_visible(), but I don't see the variation in phrasing as an
> improvement.

The only difference is I replaced the phrase "LSN interlock" with
"being dropped or truncated later in recovery" -- which is more
specific and, I thought, more clear. Without this comment, it took me
some time to understand the scenarios that might lead us to skip
updating the heap block. heap_xlog_visible() has cause to describe
this situation in an earlier comment -- which is why I think the LSN
interlock comment is less confusing there.

Anyway, I'm open to changing the comment. I could:
1) copy-paste the same comment as heap_xlog_visible()
2) refer to the comment in heap_xlog_visible() (comment seemed a bit
short for that)
3) diverge the comments further by improving the new comment in
heap_xlog_multi_insert() in some way
4) something else?

> The second paragraph does not convince me at all. I see no reason to
> believe that this is safe, or that it is a good idea. The code in
> xlog_heap_visible() thinks its OK to unlock and relock the page to
> make visibilitymap_set() happy, which is cringy but probably safe for
> lack of concurrent writers, but skipping locking altogether seems
> deeply unwise.

Actually in master, heap_xlog_visible() has no lock on the heap page
when it calls visibiltymap_set(). It releases that lock before
recording the freespace in the FSM and doesn't take it again.

It does unlock and relock the VM page -- because visibilitymap_set()
expects to take the lock on the VM.

I agree that not holding the heap lock while updating the VM is
unsatisfying. We can't hold it while doing the IO to read in the VM
block in XLogReadBufferForRedoExtended(). So, we could take it again
before calling visibilitymap_set(). But we don't always have the heap
buffer, though. I suspect this is partially why heap_xlog_visible()
unconditionally passes InvalidBuffer to visibilitymap_set() as the
heap buffer and has special case handling for recovery when we don't
have the heap buffer.

In any case, it isn't an active bug, and I don't think future-proofing
VM replay (i.e. against parallel recovery) is a prerequisite for
committing this patch since it is also that way on master.

> - * visibilitymap_set - set a bit in a previously pinned page
> + * visibilitymap_set - set bit(s) in a previously
> pinned page and log
> + * visibilitymap_set_vmbits - set bit(s) in a pinned page
>
> I suspect the indentation was done with a different mix of spaces and
> tabs here, because this doesn't align for me.

oops, fixed.

I pushed the ERRCODE_DATA_CORRUPTED patch, so attached v11 is rebased
and also has the changes mentioned above.

Since you've started reviewing the set, I'll note that patches
0005-0011 are split up for ease of review and it may not necessarily
make sense to keep that separation for eventual commit. They are a
series of steps to move VM updates from lazy_scan_prune() into
pruneheap.c.

- Melanie

Attachment Content-Type Size
v11-0001-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch text/x-patch 11.8 KB
v11-0003-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch text/x-patch 28.3 KB
v11-0002-Make-heap_page_is_all_visible-independent-of-LVR.patch text/x-patch 5.4 KB
v11-0005-Combine-lazy_scan_prune-VM-corruption-cases.patch text/x-patch 7.4 KB
v11-0004-Use-xl_heap_prune-record-for-setting-empty-pages.patch text/x-patch 5.9 KB
v11-0008-Keep-all_frozen-updated-too-in-heap_page_prune_a.patch text/x-patch 3.1 KB
v11-0007-Find-and-fix-VM-corruption-in-heap_page_prune_an.patch text/x-patch 12.1 KB
v11-0006-Combine-vacuum-phase-I-VM-update-cases.patch text/x-patch 5.8 KB
v11-0009-Update-VM-in-pruneheap.c.patch text/x-patch 12.5 KB
v11-0010-Rename-PruneState.freeze-to-attempt_freeze.patch text/x-patch 3.7 KB
v11-0013-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch text/x-patch 7.1 KB
v11-0014-Use-GlobalVisState-to-determine-page-level-visib.patch text/x-patch 10.8 KB
v11-0011-Eliminate-xl_heap_visible-from-vacuum-phase-I-pr.patch text/x-patch 29.2 KB
v11-0012-Remove-xl_heap_visible-entirely.patch text/x-patch 24.0 KB
v11-0015-Inline-TransactionIdFollows-Precedes.patch text/x-patch 5.0 KB
v11-0016-Unset-all-visible-sooner-if-not-freezing.patch text/x-patch 2.5 KB
v11-0017-Allow-on-access-pruning-to-set-pages-all-visible.patch text/x-patch 27.1 KB
v11-0020-Set-pd_prune_xid-on-insert.patch text/x-patch 6.5 KB
v11-0018-Add-helper-functions-to-heap_page_prune_and_free.patch text/x-patch 19.2 KB
v11-0019-Reorder-heap_page_prune_and_freeze-parameters.patch text/x-patch 5.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2025-09-08 23:25:47 CopyMultiInsertInfo.bufferedBytes is not increased in binary format cases
Previous Message Sami Imseih 2025-09-08 21:57:19 Re: shmem_startup_hook called twice on Windows