| From: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
|---|---|
| To: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
| Cc: | Andres Freund <andres(at)anarazel(dot)de>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
| Subject: | Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) |
| Date: | 2025-12-09 17:48:45 |
| Message-ID: | CAAKRu_bvkCZ+xBzBjujJMkA5STU+b6AtrUUTjcvAH=ZnnpTtzA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Thu, Dec 4, 2025 at 12:11 AM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>
> I resisted this patch again today. I reviewed 0001-0004, and got a few more comments:
Thanks for the review! v24 attached with updates you suggested as well
as the bug fix described below.
I realized my code didn't mark the heap buffer dirty if we were not
modifying it (i.e. only setting the VM). This trips an assert in
XLogRegisterBuffer() which requires that all buffers registered with
the WAL machinery are marked dirty unless REGBUF_NO_CHANGE is passed.
It wasn't possible to hit it in master because we unconditionally
dirtied the buffer if we found the VM not set in
find_next_unskippable_block() -- even if we made no changes to the
heap buffer. But my refactoring only dirtied the heap buffer if we
modified it (either pruning/freezing or setting PD_ALL_VISIBLE).
In attached v24, I once again always dirty the heap buffer before
registering it. We can't skip adding the heap buffer to the WAL chain
even if we didn't modify it, because we use it to update the freespace
map during recovery. We could pass REGBUF_NO_CHANGE when the heap
buffer is completely unmodified. But the delicate special case logic
doesn't seem worth the effort to maintain, as the only time the heap
buffer should be unmodified is when the VM has been truncated or
removed. I added a test to the commit doing this refactoring that
would have caught my mistake (0003).
I also split the refactoring of the VM setting logic into more commits
to help make it clearer (0003-0004). We could technically commit the
refactoring commits to master. I had not originally intended to do so
since they do not have independent value beyond clarity for the
reviewer.
In this set 0001 and 0002 are independent. 0003-0007 are all small
steps toward the single change in 0007 which combines the VM updates
into the same WAL record as pruning and freezing. 0008 and 0009 are
removing the rest of XLOG_HEAP2_VISIBLE. 0010 - 0012 are refactoring
needed to set the VM during on-access pruning. 0013 - 0015 are small
steps toward setting the VM on-access. And 0016 sets the prune xid on
insert so we may set the VM on-access for pages that have only new
data.
> +static bool
> +heap_page_will_set_vis(Relation relation,
>
> Actually, I wanted to comment on the new function name in last round of review, but I guess I missed that.
>
> I was very confused what “set_vis” means, and finally figured out “vis” should stand for “visibility”. Here “vis” actually means “visibility map bits”. There is the other “vis” in the last parameter’s name “do_set_pd_vis” where the “vis” should be mean “PD_ALL_VISIBLE” bit. So the two “vis” feels making things confusing.
>
> How about rename the function to “heap_page_will_set_vm_bits”, and rename the last parameter to “do_set_all_visible”?
I named it that way because it was responsible for telling us what we
should set the VM to _and_ if we should set PD_ALL_VISIBLE. However,
once I corrected the bug mentioned above, we always set PD_ALL_VISIBLE
if setting the VM, so I was able to remove this ambiguity. As such
I've renamed the function heap_page_will_set_vm() (and removed the
last parameter).
> + * Decide whether to set the visibility map bits for heap_blk, using
> + * information from PruneFreezeResult and all_visible_according_to_vm. This
> + * function does not actually set the VM bit or page-level hint,
> + * PD_ALL_VISIBLE.
> + *
> + * If it finds that the page-level visibility hint or VM is corrupted, it will
> + * fix them by clearing the VM bit and page hint. This does not need to be
> + * done in a critical section.
> + *
> + * Returns true if one or both VM bits should be set, along with the desired
> + * flags in *new_vmbits. Also indicates via do_set_pd_vis whether
> + * PD_ALL_VISIBLE should be set on the heap page.
> + */
> ```
>
> This function comment mentions PD_ALL_VISIBLE twice, but never mentions ALL_FROZEN. So “Returns true if one or both VM bits should be set” fells unclear. How about rephrase like "Returns true if the all-visible and/or all-frozen VM bits should be set.”
PD_ALL_VISIBLE is the page-level visibility hint (not the VM bit) and
there is no page level frozen hint. It doesn't mention that the VM
bits are all-visible and all-frozen, though, so I have modified the
comment a bit to make sure the all-frozen bit of the VM is mentioned.
> + * Now handle two potential corruption cases:
> + *
> + * These do not need to happen in a critical section and are not
> + * WAL-logged.
> + *
> + * As of PostgreSQL 9.2, the visibility map bit should never be set if the
> + * page-level bit is clear. However, it's possible that the bit got
> + * cleared after heap_vac_scan_next_block() was called, so we must recheck
> + * with buffer lock before concluding that the VM is corrupt.
> + */
> + else if (all_visible_according_to_vm && !PageIsAllVisible(heap_page) &&
> + visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
> + {
> + ereport(WARNING,
> + (errcode(ERRCODE_DATA_CORRUPTED),
> + errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
> + RelationGetRelationName(relation), heap_blk)));
> +
> + visibilitymap_clear(relation, heap_blk, vmbuffer,
> + VISIBILITYMAP_VALID_BITS);
> + }
> ```
>
> Here in the comment and error message, I guess “visibility map bit” refers to “all visible bit”, can we be explicit?
This is an existing comment in lazy_scan_prune() that I simply moved.
It isn't valid for the all-frozen bit to be set unless the all-visible
bit is set. I'm not sure whether specifying which bits were set in the
warning will help users debug the corruption they are seeing. But I
think it is a reasonable suggestion to make. Perhaps it is worth
suggesting this (adding the specific vmbits to the warning message) in
a separate thread since it is an independent improvement on master?
- Melanie
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Shlok Kyal | 2025-12-09 17:48:48 | Re: Skipping schema changes in publication |
| Previous Message | Shlok Kyal | 2025-12-09 17:46:59 | Re: Skipping schema changes in publication |