Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

From: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Xuneng Zhou <xunengzhou(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Date: 2025-12-22 07:19:39
Message-ID: 6BC5DBAB-6084-4BB8-8450-52E9648AB021@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Dec 20, 2025, at 05:09, Melanie Plageman <melanieplageman(at)gmail(dot)com> wrote:
>
> Attached v29 addresses some feedback and also corrects a small error
> with the assertion I had added in the previous version's 0009.
>
> On Thu, Dec 18, 2025 at 10:38 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>>
>> I’ve done a basic review of patches 1 and 2. Here are some comments
>> which may be somewhat immature, as this is a fairly large change set
>> and I’m new to some parts of the code.
>>
>> 1) Potential stale old_vmbits after VM repair n v2
>
> Good catch! I've fixed this in attached v29.
>
>> 2) Add Assert(BufferIsDirty(buf))
>>
>> Since the patch's core claim is "buffer must be dirty before WAL
>> registration", an assertion encodes this invariant. Should we add:
>>
>> Assert(BufferIsValid(buf));
>> Assert(BufferIsDirty(buf));
>>
>> right before the visibilitymap_set() call?
>
> There are already assertions that will trip in various places -- most
> importantly in XLogRegisterBuffer(), which is the one that inspired
> this refactor.
>
>> The comment at lines:
>>> "The only scenario where it is not already dirty is if the VM was removed…"
>>
>> This phrasing could become misleading after future refactors. Can we
>> make it more direct like:
>>
>>> "We must mark the heap buffer dirty before calling visibilitymap_set(), because it may WAL-log the buffer and XLogRegisterBuffer() requires it."
>
> I see your point about future refactors missing updating comments like
> this. But, I don't think we are going to refactor the code such that
> we can have PD_ALL_VISIBLE set without the VM bits set more often.
> Also, it is common practice in Postgres to describe very specific edge
> cases or odd scenarios in order to explain code that may seem
> confusing without the comment. It does risk that comment later
> becoming stale, but it is better that future developers understand why
> the code is there.
>
> That being said, I take your point that the comment is confusing. I
> have updated it in a different way.
>
>>> "Even if PD_ALL_VISIBLE is already set, we don't need to worry about unnecessarily dirtying the heap buffer, as it must be marked dirty before adding it to the WAL chain. The only scenario where it is not already dirty is if the VM was removed..."
>>
>> In this test we now call MarkBufferDirty() on the heap page even when
>> only setting the VM, so the comments claiming “does not need to modify
>> the heap buffer”/“no heap page modification” might be misleading. It
>> might be better to say the test doesn’t need to modify heap
>> tuples/page contents or doesn’t need to prune/freeze.
>
> The point I'm trying to make is that we have to dirty the buffer even
> if we don't modify the page because of the XLOG sub-system
> requirements. And, it may seem like a waste to do that if not
> modifying the page, but the page will rarely be clean anyway. I've
> tried to make this more clear in attached v29.
>
> - Melanie
> <v29-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch><v29-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patch><v29-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patch><v29-0004-Set-the-VM-in-heap_page_prune_and_freeze.patch><v29-0005-Move-VM-assert-into-prune-freeze-code.patch><v29-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch><v29-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch><v29-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patch><v29-0009-Simplify-heap_page_would_be_all_visible-visibili.patch><v29-0010-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch><v29-0011-Unset-all_visible-sooner-if-not-freezing.patch><v29-0012-Track-which-relations-are-modified-by-a-query.patch><v29-0013-Pass-down-information-on-table-modification-to-s.patch><v29-0014-Allow-on-access-pruning-to-set-pages-all-visible.patch><v29-0015-Set-pd_prune_xid-on-insert.patch>

A few more comments on v29:

1 - 0002 - Looks like since 0002, visibilitymap_set()’s return value is no longer used, so do we need to update the function and change return type to void? I remember in some patches, to address Coverity alerts, people had to do “(void) function_with_a_return_value()”.

2 - 0003
```
+ * Helper to correct any corruption detected on an heap page and its
```

Nit: “an” -> “a”

3 - 0003
```
+static bool
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits)
+{
+ Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
```

Right before this function is called:
```
old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
+ if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
+ presult.lpdead_items, vmbuffer,
+ old_vmbits))
```

So, the Assert() is checking if old_vmbits is newly returned from visibilitymap_get_status(), in that case, identify_and_fix_vm_corruption() can take vmbits as a pointer , and it calls visibilitymap_get_status() to get vmbits itself and returns vmbits via the pointer, so that we don’t need to call visibilitymap_get_status() twice.

4 - 0004
```
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set and
+ * we have attempted to update the VM.
+ */
+ uint8 new_vmbits;
+ uint8 old_vmbits;
```

The comment feels a little confusing to me. "HEAP_PAGE_PRUNE_UPDATE_VM option is set” is a clear indication, but how to decide "we have attempted to update the VM”? By reading the code:
```
+ prstate->attempt_update_vm =
+ (params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
```

It’s just the result of HEAP_PAGE_PRUNE_UPDATE_VM being set. So, maybe we don’t the “and” part.

5 - 0004
```
+ * Returns true if one or both VM bits should be set, along with returning the
+ * current value of the VM bits in *old_vmbits and the desired new value of
+ * the VM bits in *new_vmbits.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate,
+ Relation relation,
+ BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+ Buffer vmbuffer,
+ int nlpdead_items,
+ uint8 *old_vmbits,
+ uint8 *new_vmbits)
+{
+ if (!prstate->attempt_update_vm)
+ return false;
```

old_vmbits and new_vmbits are purely output parameters. So, maybe we should set them to 0 inside this function instead of relying on callers to initialize them.

I think this is a similar case where I raised a comment earlier about initializing presult to {0} in the callers, and you only wanted to set presult in heap_page_prune_and_freeze().

6 - 0004
```
@@ -823,13 +975,19 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
MultiXactId *new_relmin_mxid)
{
Buffer buffer = params->buffer;
+ Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
+ BlockNumber blockno = BufferGetBlockNumber(buffer);
PruneState prstate;
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_vm;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ uint8 new_vmbits = 0;
+ uint8 old_vmbits = 0;
+

/* Initialize prstate */
```

Nit: an extra empty line is added.

7 - 0005
```
- * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
- * will return 'all_visible', 'all_frozen' flags to the caller.
+ * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
```

Nit: a tailing dot is needed in the end of the comment line.

8 - 0005
```
@@ -978,6 +1003,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
+ TransactionId vm_conflict_horizon = InvalidTransactionId;
```

I guess the variable name “vm_conflict_horizon” comes from the old "presult->vm_conflict_horizon”. But in the new logic, this variable is used more generic, for example Assert(debug_cutoff == vm_conflict_horizon). I see 0006 has renamed to “conflict_xid”, so it’s up to you if or not rename it. But to make the commit self-contained, I’d suggest renaming it.

9 - 0006
```
@@ -3537,6 +3537,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
{
ItemId itemid;
HeapTupleData tuple;
+ TransactionId dead_after = InvalidTransactionId;
```

This initialization seems to not needed, as HeapTupleSatisfiesVacuumHorizon() will always set a value to it.

10 - 0010
```
+ * there is any snapshot that still consider the newest xid on
```

Nit: consider -> considers

11 - 0011
```
+ * page. If we won't attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
```

The comment says “just unset all-visible”, but the code actually also unset all_frozen.

12 - 0012
```
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
```

As we intentionally only want indexes, does it make sense to just name the field es_modified_rtindexes to make it more explicit.

13 - 0012
```
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
```

I think this comment is a little misleading, because SELECT FOR UPDATE/SHARE doesn’t always modify tuples of the relation. If a reader not associating this code with this patch, he may consider the comment is wrong. So, I think we should make the comment more explicit. Maybe rephrase like “If it has a rowmark, the relation may modify or lock heap pages”.

14 - 0015 - commit message
```
Setting pd_prune_xid on insert can cause a page to be dirtied and
written out when it previously would not have been, affetcting the
```

Typo: affetcting -> affecting

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Xuneng Zhou 2025-12-22 07:56:59 Re: Implement waiting for wal lsn replay: reloaded
Previous Message vignesh C 2025-12-22 06:48:32 Re: Proposal: Conflict log history table for Logical Replication