Quick Links

Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject:	Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Date:	2025-09-10 20:01:25
Message-ID:	CA+TgmobYY2URHKBMh1NHo1zF3Z28TiS_+0aSyRYyBfvauCPZzA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Sep 9, 2025 at 7:08 PM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
> Fair. I've introduced new XLHP flags in attached v13. Hopefully it
> puts an end to the horror.

I suggest not renumbering all of the existing flags and just adding
these new ones at the end. Less code churn and more likely to break in
an obvious way if you mix up the two sets of flags.

More on 0002:

+ set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;

Maybe just if (XLogHintBitIsNeeded) set_heap_lsn = true? I don't feel
super-strongly that what you've done is bad but it looks weird to my
eyes.

+ * If we released any space or line pointers or will be setting a page in
+ * the visibility map, measure the page's freespace to later update the

"setting a page in the visibility map" seems a little muddled to me.
You set bits, not pages.

+ * all-visible (or all-frozen, depending on the vacuum mode,) which is

This comma placement is questionable.

/*
+ * Note that the heap relation may have been dropped or truncated, leading
+ * us to skip updating the heap block due to the LSN interlock. However,
+ * even in that case, it's still safe to update the visibility map. Any
+ * WAL record that clears the visibility map bit does so before checking
+ * the page LSN, so any bits that need to be cleared will still be
+ * cleared.
+ *
+ * Note that the lock on the heap page was dropped above. In normal
+ * operation this would never be safe because a concurrent query could
+ * modify the heap page and clear PD_ALL_VISIBLE -- violating the
+ * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in
+ * the VM is set.
+ *
+ * In recovery, we expect no other writers, so writing to the VM page
+ * without holding a lock on the heap page is considered safe enough. It
+ * is done this way when replaying xl_heap_visible records (see
*/

How many copies of this comment do you plan to end up with?

The comment for log_heap_prune_and_freeze seems to be anticipating future work.

> > 0004. It is not clear to me why you need to get
> > log_heap_prune_and_freeze to do the work here. Why can't
> > log_newpage_buffer get the job done already?
>
> Well, I need something to emit the changes to the VM. I'm eliminating
> all users of xl_heap_visible. Empty pages are the ones that benefit
> the least from switching from xl_heap_visible -> xl_heap_prune. But,
> if I don't transition them, we have to maintain all the
> xl_heap_visible code (including visibilitymap_set() in its long form).
>
> As for log_newpage_buffer(), I could keep it if you think it is too
> confusing to change log_heap_prune_and_freeze()'s API (by passing
> force_heap_fpi) to handle this case, I can leave log_newpage_buffer()
> there and then call log_heap_prune_and_freeze().
>
> I just thought it seemed simple to avoid emitting the new page record
> and the VM update record, so why not -- but I don't have strong
> feelings.

Yeah, I'm not sure what the right thing to do here is. I think I was
again experiencing brain fade by forgetting that there is a heap page
and a VM page and, of course, log_heap_newpage() probably isn't going
to touch the latter. So that makes sense. On the other hand, we could
only have one type of WAL record for every single operation in the
system if we gave it enough flags, and force_heap_fpi seems
suspiciously like a flag that turns this into a whole different kind
of WAL record.

> > 0005. It looks a little curious that you delete the
> > identify-corruption logic from the end of the if-nest and add it to
> > the beginning. Ceteris paribus, you'd expect that to be worse, since
> > corruption is a rare case.
>
> On master, the two corruption cases are sandwiched between the normal
> VM set cases. And I actually think doing it this way is brittle. If
> you put the cases which set the VM first, you have to have completely
> bulletproof the if statements guarding them to foreclose any possible
> corruption case from entering because otherwise you will overwrite the
> corruption you then try to detect.

Hmm. In the current code, we first test (!all_visible_according_to_vm
&& presult.all_visible), then (all_visible_according_to_vm &&
!PageIsAllVisible(page) && visibilitymap_get_status(vacrel->rel,
blkno, &vmbuffer) != 0), and then (presult.lpdead_items > 0 &&
PageIsAllVisible(page)). The first and second can never coexist,
because they require opposite values of all_visible_according_to_vm.
The second and third cannot coexist because they require opposite
values of PageIsAllVisible(page). It is not entirely obvious that the
first and third tests couldn't both pass, but you'd have to have
presult.all_visible and presult.lpdead_items > 0, and it's a bit hard
to see how heap_page_prune_and_freeze() could ever allow that.
Consider:

if (prstate.all_visible && prstate.lpdead_items == 0)
{
presult->all_visible = prstate.all_visible;
presult->all_frozen = prstate.all_frozen;
}
else
{
presult->all_visible = false;
presult->all_frozen = false;
}
...
presult->lpdead_items = prstate.lpdead_items;

So I don't really think I'm persuaded that the current way is brittle.
But that having been said, I agree with you that the order of the
checks is kind of random, and I don't think it really matters that
much for performance. What does matter is clarity. I feel like what
I'd ideally like this logic to do is say: do we want the VM bit for
the page to be set to all-frozen, just all-visible, or neither? Then
push the VM bit to the correct state, dragging the page-level bit
along behind. And the current logic sort of does that. It's roughly:

1. Should we go from not-all-visible to either all-visible or
all-frozen? If yes, do so.
2. Should we go from either all-visible or all-frozen to
not-all-visible? If yes, do so.
3. Should we go from either all-visible or all-frozen to
not-all-visible for a different reason? If yes, do so.
4. Should we go from all-visible to all-frozen? If yes, do so.

But what's weird is that all the tests are written differently, and we
have two different reasons for going to not-all-visible, namely
PD_ALL_VISIBLE-not-set and dead-items-on-page, whereas there's only
one test for each of the other state-transitions, because the
decision-making for those cases is fully completed at an earlier
stage. I would kind of like to see this expressed in a way that first
decides which state transition to make (forward-to-all-frozen,
forward-to-all-visible, backward-to-all-visible,
backward-to-not-all-visible, nothing) and then does the corresponding
work. What you're doing instead is splitting half of those functions
off into a helper function while keeping the other half where they are
without cleaning up any of the logic. Now, maybe that's OK: I'm far
from having grokked the whole patch set. But it is not any more clear
than what we have now, IMHO, and perhaps even a bit less so.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) at 2025-09-09 23:07:58 from Melanie Plageman

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David G. Johnston	2025-09-10 20:02:46	Stale comment in guc.h; publish listing of setting sources?
Previous Message	Peter Geoghegan	2025-09-10 19:59:24	Re: PostgreSQL 18 GA press release draft