Quick Links

Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

From:	Melanie Plageman <melanieplageman(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Kirill Reshke <reshkekirill(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Xuneng Zhou <xunengzhou(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject:	Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Date:	2026-03-20 02:38:47
Message-ID:	CAAKRu_bsHbvt+VqbjHXsdphKf8fqwBEutRhH3fmo+qUVe=yBKw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Thanks for the detailed review! Unless otherwise specified, attached
v41 includes all of your straightforward review points.

On Wed, Mar 18, 2026 at 1:14 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> > + params.relation = relation;
> > + params.buffer = buffer;
> > + params.vmbuffer = *vmbuffer;
> > + params.reason = PRUNE_ON_ACCESS;
> > + params.vistest = vistest;
> > + params.cutoffs = NULL;
> >
>
> > * We don't pass the HEAP_PAGE_PRUNE_MARK_UNUSED_NOW option
> > @@ -284,14 +312,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
> > * cannot safely determine that during on-access pruning with the
> > * current implementation.
> > */
> > - PruneFreezeParams params = {
> > - .relation = relation,
> > - .buffer = buffer,
> > - .reason = PRUNE_ON_ACCESS,
> > - .options = 0,
> > - .vistest = vistest,
> > - .cutoffs = NULL,
> > - };
> > + params.options = 0;
> >
> > heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
> > NULL, NULL);
>
> Why does this change the way the PruneFreezeParams variable is defined? I
> don't really mind, it's just a bit confusing.

I couldn't use the designated initializer after visibilitymap_pin()
and I thought it was worse to have the designated initializer
nitialize vmbuffer to InvalidBuffer and then have to set vmbuffer to
the real vmbuffer after visibilitymap_pin().

> > + * Helper to fix visibility-related corruption on a heap page and its
> > + * corresponding VM page. An all-visible page cannot have dead items nor can
> > + * it have tuples that are not visible to all running transactions. It clears
> > + * the VM corruption as well as resetting the vmbits used during pruning.
>
> So this is now only called when we already know there's corruption? I think
> that could be clearer in the comments.
>
> Seems a bit odd that the function then figures out what it should do from the
> page & VM contents, given that the caller already needs to have known that
> something is wrong?

Yea, it was all a bit off. I agree. I've tried something new and made
a VMCorruptionType enum for the caller to pass in which tells this
function what to do (clear PD_ALL_VISIBLE and/or clear VM) and what
warning to emit.

> > +static void
> > +heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
> > +{
> > + {
> > + ereport(WARNING,
> > + (errcode(ERRCODE_DATA_CORRUPTED),
> > + errmsg("tuple not visible to all transactions found on page marked all-visible"),
> > + errcontext("relation \"%s\", page %u, tuple %u",
> > + relname, prstate->block, offnum)));
> > + }
>
> Wait, why are we now WARNING about the PageIsAllVisible() &&
> prstate->lpdead_items == 0 case? Seems to run flatly counter to the comment
> above about GetOldestNonRemovableTransactionId() going backward?

Only if the page has tuples with HTSV_Result
HEAPTUPLE_RECENTLY_DEAD/DELETE_IN_PROGRESS/INSERT_IN_PROGRESS. Even if
GetOldestNonRemovableTransactionId() goes backwards that should only
make it so that xids we previously thought were visible now show as
not visible to all. But those have to be HEAPTUPLE_LIVE tuples. We
should never thought it was all-visible if there were in-progress
deletes/inserts. So, I think it is okay. Now (in v41), the caller
would need to pass VM_CORRUPT_TUPLE_VISIBILITY and intend to emit the
warning.

> > + else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
> > + {
> > + /*
> > + * As of PostgreSQL 9.2, the visibility map bit should never be set if
> > + * the page-level bit is clear. However, for vacuum, it's possible
> > + * that the bit got cleared after heap_vac_scan_next_block() was
> > + * called, so we must recheck now that we have the buffer lock before
> > + * concluding that the VM is corrupt.
> > + */
> > + ereport(WARNING,
> > + (errcode(ERRCODE_DATA_CORRUPTED),
> > + errmsg("page is not marked all-visible but visibility map bit is set"),
> > + errcontext("relation \"%s\", page %u",
> > + relname, prstate->block)));
> > + }
> > +
> > + visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
> > + VISIBILITYMAP_VALID_BITS);
> > + prstate->vmbits = 0;
>
> So we can end up clearing the VM without emitting any warning?

This was me trying to avoid duplicating code in the branches. In v41,
I error out if the caller doesn't specify a valid corruption type, so
anything that clears the VM will have emitted a warning.

> > +static void
> > +heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
> > +{
> > + OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
> > + Page page = prstate->page;
> > +
> > + Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
> > + (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
> > + !prstate->attempt_freeze));
> > +
> > + /* We'll fill in presult for the caller */
> > + memset(presult, 0, sizeof(PruneFreezeResult));
> > +
> > + presult->vmbits = prstate->vmbits;
> > +
> > + /* Clear any stale prune hint */
> > + if (TransactionIdIsValid(PageGetPruneXid(page)))
> > + {
> > + PageClearPrunable(page);
> > + MarkBufferDirtyHint(prstate->buffer, true);
> > + }
> > +
> > + if (PageIsEmpty(page))
> > + return;
> > +
> > + presult->hastup = true;
>
> Is that actually a given? Couldn't the page consist solely out of unused
> items? That'd make PageIsEmpty() return false, but should still allow
> truncation.

Good point. I've changed it to set hastup when counting live tuples.

But should I set hastup it if I see an LP_REDIRECT pointer? I know I
should always see a LP_NORMAL pointer if I see an LP_REDIRECT pointer,
but I just wondered if I should explicitly set hastup when I see
LP_REDIRECT since heap_prune_record_redirect() sets hastup = true.

> > diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
>
> > + /*
> > + * After processing all the live tuples on the page, if the newest xmin
> > + * amongst them may be considered running by any snapshot, the page cannot
> > + * be all-visible.
> > + */
> > + if (prstate.set_all_visible &&
> > + TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
> > + GlobalVisTestXidMaybeRunning(prstate.vistest,
> > + prstate.visibility_cutoff_xid))
> > + prstate.set_all_visible = prstate.set_all_frozen = false;
> > +
>
> So the docs for prstate.visibility_cutoff_xid say:
>
> * visibility_cutoff_xid is the newest xmin of live tuples on the page.
> * The caller can use it as the conflict horizon, when setting the VM
> * bits. It is only valid if we froze some tuples, and set_all_frozen is
> * true.
>
> But here we look at it without checking that we froze some tuples. I guess
> the comment is outdated?

That comment was never correct -- or I have chopped it into
unrecognizable bits over the last two years.

> Could the "going backward" thing possibly trigger a spurious assert in
>
> Assert(heap_page_is_all_visible(vacrel->rel, buf,
> vacrel->vistest, &debug_all_frozen,
> &debug_cutoff, &vacrel->offnum));

I don't think anything (today) updates GlobalVisState between
GlobalVisTestXidMaybeRunning() and the heap_page_is_all_visible()
assert.

I had removed the visibility_cutoff_xid part of the assertion on the
intuition that comparing an exact horizon would no longer work when
using GlobalVisState. I can't remember if I actually saw failing
tests, but I don't see them anymore (so I've put it back).

The heap_page_is_all_visible() assertion moves into
heap_page_prune_and_freeze() in a later patch in this set, and while
it is also in a place where I don't think GlobalVisState can have
moved between making the page changes and calling
heap_page_is_all_visible(), I suspect it won't be a totally reliable
assertion now that it uses a moving target for comparison. What do you
think?

> > From a1d768a8cea8ac13e250188ec96c01d98acda94a Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Sat, 28 Feb 2026 16:06:51 -0500
> > Subject: [PATCH v40 04/12] Keep newest live XID up-to-date even if page not
> > all-visible
>
> I guess I'd have expected 03 and 04 to be swapped... But whatever.

It couldn't be because I used GlobalVisState to always keep it
up-to-date (even for on-access pruning).

> > @@ -1076,6 +1116,30 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
> > prstate.set_all_visible = prstate.set_all_frozen = false;
> >
> > Assert(!prstate.set_all_frozen || prstate.set_all_visible);
> > + Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
>
> Why didn't we have this assert earlier?

It was in lazy_scan_prune() as:
Assert(!presult.set_all_visible || !(*has_lpdead_items));

> > + do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
>
> Most of the other heap_page_prune_and_freeze() helpers are named
> heap_prune_xyz(), why not follow that here?
>
> I guess this holds for a few other helpers added in earlier commits
> too. E.g. heap_page_bypass_prune_freeze() should probably be
> heap_prune_bypass_prune_freeze() or such.

Most of the helpers prefixed with "heap_prune" now directly do
something related to pruning like recording line pointers and
traversing hot chains. heap_page_will_set_vm() and
heap_page_will_freeze() have nothing to do with pruning, so I think it
makes sense they are named differently.

And I don't think we are in any danger of folks using functions not
prefixed with heap_prune for other purposes, given that most of them
take a PruneState as an argument.

I'll do a big rename if you feel strongly about it, though.

> > @@ -1097,14 +1161,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
> >
> > /*
> > * If that's all we had to do to the page, this is a non-WAL-logged
> > - * hint. If we are going to freeze or prune the page, we will mark
> > - * the buffer dirty below.
> > + * hint. If we are going to freeze or prune the page or set
> > + * PD_ALL_VISIBLE, we will mark the buffer dirty below.
> > + *
> > + * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
> > + * for the VM to be set and PD_ALL_VISIBLE to be clear.
> > */
> > - if (!do_freeze && !do_prune)
> > + if (!do_freeze && !do_prune && !do_set_vm)
> > MarkBufferDirtyHint(prstate.buffer, true);
> > }
>
> This block is gated by if (do_hint_prune) which is computed as:
>
> /*
> * Even if we don't prune anything, if we found a new value for the
> * pd_prune_xid field or the page was marked full, we will update the hint
> * bit.
> */
> do_hint_prune = PageGetPruneXid(prstate.page) != prstate.new_prune_xid ||
> PageIsFull(prstate.page);
>
> It's not really related to this change, but I'm just confused a bit by the
> "|| PageIsFull(prstate.page)". What is that about? Why do we want to mark the
> buffer DirtyHint if the page is full? It very well might already have been
> marked as such, no?

Because if the page is marked full, we clear that hint, and, if that's
the only change we make to the page, we need to do
MarkBufferDirtyHint().

> > +#ifdef USE_ASSERT_CHECKING
> > + if (prstate.set_all_visible)
> > + {
> > + TransactionId debug_cutoff;
> > + bool debug_all_frozen;
> > +
> > + Assert(prstate.lpdead_items == 0);
> > +
> > + Assert(heap_page_is_all_visible(prstate.relation, prstate.buffer,
> > + prstate.vistest,
> > + &debug_all_frozen,
> > + &debug_cutoff, off_loc));
> > +
> > + /*
> > + * It's possible the page is composed entirely of frozen tuples but is
> > + * not set all-frozen in the VM and did not pass
> > + * HEAP_PAGE_PRUNE_FREEZE. In this case, it's possible
> > + * heap_page_is_all_visible() finds the page completely frozen, even
> > + * though prstate.set_all_frozen is false.
> > + */
> > + Assert(!prstate.set_all_frozen || debug_all_frozen);
>
> Seems like we could verify that debug_cutoff isn't newer than conflict_xid?

Well, not conflict_xid, but newest_xid, yes.

> Hm. I guess aborting after we did incorrect pruning/freezing/VMing is better
> than not, but it'd be even better if we did it before corrupting things. But I
> guess it'd be not trivial to add something like the debug_cutoff assertion I
> suggest above, when freezing of tuples is only executed after
> heap_page_is_all_visible() (for dead tuples heap_page_would_be_all_visible()
> already has provisions).
>
> It's probably more a theoretical concern than a real worry.

Yea, I think the work it would take to make
heap_page_would_be_all_visible() work for frozen tuples wouldn't be
worth it just to get it to assert out before executing the page
changes.

> > + presult->new_all_visible_pages = 0;
> > + presult->new_all_frozen_pages = 0;
> > + presult->new_all_visible_frozen_pages = 0;
>
> Isn't it odd to talk about pages here? Given that heap_page_prune_and_freeze()
> only ever operates on exactly one page. Is that just so you can do
>
> > + vacrel->new_all_visible_pages += presult.new_all_visible_pages;

I made this change because you didn't like it when I passed old_vmbits
and new_vmbits back out to lazy_scan_prune() to derive these counters.
FWIW I think it's better not to have lazy_scan_prune() compare new and
old vmbits to increment counters, because lazy_scan_prune() shouldn't
have to know about the VM anymore once it is not setting it.

> > + if (do_set_vm)
> > + {
> > + if ((prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
> > + {
> > + presult->new_all_visible_pages = 1;
> > + if (prstate.set_all_frozen)
> > + presult->new_all_visible_frozen_pages = 1;
> > + }
> > + else if ((prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
> > + prstate.set_all_frozen)
> > + presult->new_all_frozen_pages = 1;
> > + }
> > +
> > if (prstate.attempt_freeze)
> > {
> > if (presult->nfrozen > 0)
>
> Feels like this is kinda redoing what heap_page_will_set_vm already did.

The logic is different than what is in heap_page_will_set_vm() because
there we don't care about what old_vmbits is. We are simply concerned
with whether we should set new_vmbits to something.

So we need to have logic somewhere that is figuring out if the vmbits
were set before and whether we newly set them. That can either go in
heap_page_prune_and_freeze() and we can use that to set the counters
in the LVRelState or it can go in lazy_scan_prune().

I think it makes more sense in heap_page_prune_and_freeze() so that
lazy_scan_prune() doesn't have to know about the VM's new/old state,
which it otherwise no longer deals with.

> > @@ -1923,13 +1926,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
> > PageSetAllVisible(page);
> > PageClearPrunable(page);
> > - visibilitymap_set(vacrel->rel, blkno, buf,
> > - InvalidXLogRecPtr,
> > - vmbuffer, InvalidTransactionId,
> > - VISIBILITYMAP_ALL_VISIBLE |
> > - VISIBILITYMAP_ALL_FROZEN);
> > + visibilitymap_set_vmbits(blkno,
> > + vmbuffer,
> > + VISIBILITYMAP_ALL_VISIBLE |
> > + VISIBILITYMAP_ALL_FROZEN,
> > + vacrel->rel->rd_locator);
> > +
> > + /*
> > + * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
> > + * setting the VM.
> > + */
> > + if (RelationNeedsWAL(vacrel->rel))
> > + log_heap_prune_and_freeze(vacrel->rel, buf,
> > + vmbuffer,
> > + VISIBILITYMAP_ALL_VISIBLE |
> > + VISIBILITYMAP_ALL_FROZEN,
> > + InvalidTransactionId, /* conflict xid */
> > + false, /* cleanup lock */
> > + PRUNE_VACUUM_SCAN, /* reason */
> > + NULL, 0,
> > + NULL, 0,
> > + NULL, 0,
> > + NULL, 0);
> > +
> > END_CRIT_SECTION();
>
> It's a tad odd that we do:
>
> /*
> * It's possible that another backend has extended the heap,
> * initialized the page, and then failed to WAL-log the page due
> * to an ERROR. Since heap extension is not WAL-logged, recovery
> * might try to replay our record setting the page all-visible and
> * find that the page isn't initialized, which will cause a PANIC.
> * To prevent that, check whether the page has been previously
> * WAL-logged, and if not, do that now.
> */
> if (RelationNeedsWAL(vacrel->rel) &&
> !XLogRecPtrIsValid(PageGetLSN(page)))
> log_newpage_buffer(buf, true);
>
> if we then immediately afterwards emit a WAL record that could just as well
> have included in FPI of the heap page.

I originally added a flag to log_heap_prune_and_freeze() that could
force an FPI but Robert disliked it, saying he found it more
confusing. He said:

> 0004. It is not clear to me why you need to get
> log_heap_prune_and_freeze to do the work here. Why can't
> log_newpage_buffer get the job done already?

I can put it back that way, I don't have strong feelings either way.
Though I imagine if I add another argument to
log_heap_prune_and_freeze(), you'll bring up creating a struct for its
arguments again...

> > diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
> > index 8a67bfa1aff..d9042e1f91d 100644
> > --- a/src/backend/access/common/bufmask.c
> > +++ b/src/backend/access/common/bufmask.c
> > @@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
> >
> > /*
> > * During replay, if the page LSN has advanced past our XLOG record's LSN,
> > - * we don't mark the page all-visible. See heap_xlog_visible() for
> > - * details.
> > + * we don't mark the page all-visible. See heap_xlog_prune_freeze() for
> > + * more details.
> > */
> > PageClearAllVisible(page);
> > }
>
> Not introduced by your change, but isn't it rather terrifying that the
> wal_consistency_checking infrastructure doesn't verify whether the page is
> marked all-visible? Wasn't aware of this. Seems bonkers to me.

Agreed. I wonder what it would take to start.

> I don't even know what specifically in heap_xlog_visible() that comment is
> referring to? Just that we only do PageSetAllVisible() if BLK_NEEDS_REDO? But
> uh, what does that have to do with anything?

Yea, this comment doesn't make sense. I think we should remove it.

But regarding why we mask PD_ALL_VISIBLE in wal consistency checking,
I wonder if this is the scenario:

Record 1 sets the VM and PD_ALL_VISIBLE
Record 2 inserts a tuple and clears PD_ALL_VISIBLE
the heap page is flushed to disk, but the VM page is not
crash
replay R1; skip setting PD_ALL_VISIBLE because the page has R2's LSN;
set the bits on the VM page

Even though we'll clear the VM when we replay R2, if we cross-check
the page and VM after replaying only R1, the VM will be set and
PD_ALL_VISIBLE will be clear. I think this is okay because no one
should see them at this time. But it might not work with wal
consistency checking.

> > @@ -477,12 +477,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
> > * If we get passed InvalidTransactionId then we do nothing (no conflict).
> > *
> > * This can happen when replaying already-applied WAL records after a
> > - * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
> > - * record that marks as frozen a page which was already all-visible. It's
> > - * also quite common with records generated during index deletion
> > - * (original execution of the deletion can reason that a recovery conflict
> > - * which is sufficient for the deletion operation must take place before
> > - * replay of the deletion record itself).
> > + * standby crash or restart
>
> Again not about your patch: I don't understand how already applied WAL can
> lead to InvalidTransactionId being passed here. The record doesn't change just
> because we had already applied the WAL?

Yea, I think the comment is just wrong. I realized the comment still
needed to reference my code, so I've updated it.

> > From 04b03c1ec3abcee75e464fef994b482df41b35f4 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Wed, 3 Dec 2025 15:07:24 -0500
> > Subject: [PATCH v40 08/12] Track which relations are modified by a query
> >
> > Save the relids of modified relations in a bitmap in the executor state.
> > A later commit will pass this information down to scan nodes to control
> > whether or not on-access pruning is allowed to set the visibility map.
> > Setting the visibility map during a scan is counterproductive if the
> > query is going to modify the page immediately after.
> >
> > Relations are considered modified if they are the target of INSERT,
> > UPDATE, DELETE, or MERGE, or if they have any row mark (including SELECT
> > FOR UPDATE/SHARE). All row mark types are included, even those which
> > don't actually modify tuples, because this bitmap is only used as a hint
> > to avoid unnecessary work.
>
> You're probably going to hate me for the question, but is there a reason to
> not compute es_modified_relids at plan time?

Yea, it probably does make more sense there. The only thing is that by
doing it in planner, it could include relids of leaf partitions that
get run-time pruned. But we won't scan those, so it is no issue for
this feature. I'm just wondering if it dilutes the meaning of
"modified relids", though.

In v41, I've implemented it in planner (which also made me realize
parallel workers previously didn't have es_modified_relids, oops).

> > @@ -992,6 +996,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
> > */
> > planstate = ExecInitNode(plan, estate, eflags);
> >
> > +#ifdef USE_ASSERT_CHECKING
> > + CrossCheckModifiedRelids(estate);
> > +#endif
>
> Not sure that buys you much, given it pretty much is just a restatement of the
> code building estate->es_modified_relids.

Yea, now that I've done it in planner, I cross-check in the executor.

> What about checking against PlannedStmt->{resultRelations, permInfos} or

I don't think it makes sense to use permInfos because according to
expand_single_inheritance_child() there is no permission checking for
child RTEs, so I think permInfos won't include everything we need.

> asserting membership at the places that actually lock/modify?

Are you thinking I should also add some in ExecInsert, ExecDelete,
ExecUpdate, and ExecLockRows? Think this might be redundant with the
executor cross-check I have now after InitPlan(). (I've done it anyway
so we can discuss).

> > diff --git a/src/include/access/genam.h b/src/include/access/genam.h
> > index 1a27bf060b3..db102803eb5 100644
> > --- a/src/include/access/genam.h
> > +++ b/src/include/access/genam.h
> > @@ -158,7 +158,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
> > Relation indexRelation,
> > Snapshot snapshot,
> > IndexScanInstrumentation *instrument,
> > - int nkeys, int norderbys);
> > + int nkeys, int norderbys, uint32 flags);
>
> I'd probably put flags in a position where it's not as easily confused with
> nkeys or norderbys.

Do you mean like move it before nkeys and norderbys or move it
earlier? I did the latter but not sure if it's weird to have flags
before snapshot (especially since the other table am routines pass it
last). I think it looks kind of weird when all of the other ones have
flags as the last argument.

> > @@ -1059,10 +1062,11 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
> > static inline TableScanDesc
> > table_beginscan_tidrange(Relation rel, Snapshot snapshot,
> > ItemPointer mintid,
> > - ItemPointer maxtid)
> > + ItemPointer maxtid, uint32 flags)
> > {
> > TableScanDesc sscan;
> > - uint32 flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
> > +
> > + flags |= SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
> >
> > sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
>
> Hm. Would it perhaps be a good idea to have an assert as to which flags are
> specified by the "user"? If e.g. another SO_TYPE_* were specified it might
> result in some odd behaviour.
>
> Perhaps this would be best done by adding an argument to
> table_beginscan_common() specifying the "internal" flags (i.e. the ones that
> specified inside table_beginscan_*) and user specified flags? Then
> table_beginscan_common could check the set of user specified flags being sane.

Yes, good idea. Done in attached v41.

It's unclear to me which flags should be considered internal though. I
think it makes sense that the SO_TYPE* flags are considered internal
because you can only specify one. But all of the other current
ScanOptions are specified inside table_beginscan_* so do you mean that
we should consider all of those internal flags?

> > + * Some optimizations can only be performed if the query does not modify
> > + * the underlying relation. Track that here.
> > + */
> > + bool modifies_base_rel;
> > } IndexFetchHeapData;
>
> Wonder if this should be in the generic IndexFetchTableData?

I added flags to the IndexFetchTableData in much the same way as the
regular table scan descriptor has them.

> > diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
> > index d264a698ff6..a5536ba4ff6 100644
> > --- a/src/test/recovery/t/035_standby_logical_decoding.pl
> > +++ b/src/test/recovery/t/035_standby_logical_decoding.pl
> > @@ -296,6 +296,7 @@ wal_level = 'logical'
> > max_replication_slots = 4
> > max_wal_senders = 4
> > autovacuum = off
> > +hot_standby_feedback = on
> > });
> > $node_primary->dump_info;
> > $node_primary->start;
> > @@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
> > $logstart = -s $node_standby->logfile;
> >
> > reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
> > - 'no_conflict_', 0, 1);
> > + 'no_conflict_', 1, 0);
> >
> > # This should not trigger a conflict
> > wait_until_vacuum_can_remove(
> > --
> > 2.43.0
>
> Why does this patch need to change anything here? Is the test buggy
> independently?

Nope. I guess that was a mistake during development. No change needed.

> > @@ -1863,16 +1864,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
> > prstate->set_all_visible = false;
> > prstate->set_all_frozen = false;
> >
> > - /* The page should not be marked all-visible */
> > - if (PageIsAllVisible(page))
> > - heap_fix_vm_corruption(prstate, offnum);
> > -
>
> Huh?

heap_prune_record_prunable() already does the corruption check, so I
don't need to do it separately for INSERT_IN_PROGRESS tuples once we
call heap_prune_record_prunable() for them.

- Melanie

Attachment	Content-Type	Size
v41-0001-Fix-visibility-map-corruption-in-more-cases.patch	text/x-patch	20.5 KB
v41-0002-Add-pruning-fast-path-for-all-visible-and-all-fr.patch	text/x-patch	7.7 KB
v41-0003-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch	text/x-patch	12.3 KB
v41-0004-Keep-newest-live-XID-up-to-date-even-if-page-not.patch	text/x-patch	15.4 KB
v41-0005-WAL-log-VM-setting-during-vacuum-phase-I-in-XLOG.patch	text/x-patch	23.1 KB
v41-0006-WAL-log-VM-setting-for-empty-pages-in-XLOG_HEAP2.patch	text/x-patch	2.7 KB
v41-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch	text/x-patch	27.5 KB
v41-0008-Track-which-relations-are-modified-by-a-query.patch	text/x-patch	8.7 KB
v41-0009-Thread-flags-through-begin-scan-APIs.patch	text/x-patch	32.9 KB
v41-0010-Pass-down-information-on-table-modification-to-s.patch	text/x-patch	11.3 KB
v41-0011-Allow-on-access-pruning-to-set-pages-all-visible.patch	text/x-patch	10.1 KB
v41-0012-Set-pd_prune_xid-on-insert.patch	text/x-patch	8.8 KB

In response to

Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) at 2026-03-18 17:14:53 from Andres Freund

Responses

Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) at 2026-03-20 23:37:08 from Melanie Plageman

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Boris Mironov	2026-03-20 02:59:01	Re: Idea to enhance pgbench by more modes to generate data (multi-TXNs, UNNEST, COPY BINARY)
Previous Message	vignesh C	2026-03-20 02:21:34	Re: Skipping schema changes in publication