| From: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
|---|---|
| To: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
| Cc: | Xuneng Zhou <xunengzhou(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
| Subject: | Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) |
| Date: | 2025-12-22 17:57:16 |
| Message-ID: | CAAKRu_ZCjHoRPfQ8AbMrFY8TOMCPAvZ0_m9SX7yg0edfTk45-g@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, Dec 22, 2025 at 2:20 AM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>
> A few more comments on v29:
Thanks for the continued review! I've attached v30.
> 1 - 0002 - Looks like since 0002, visibilitymap_set()’s return value is no longer used, so do we need to update the function and change return type to void? I remember in some patches, to address Coverity alerts, people had to do “(void) function_with_a_return_value()”.
I was torn about whether or not to change the return value. Coverity
doesn't always warn about unused return values. Usually it warns if it
perceives the return value as needed for error checking or if it
thinks not using the return value is incorrect. It may still warn in
this case, but it's not obvious to me which way it would go.
I have changed the function signature as you suggested in v30.
My hesitation is that visibilitymap_set() is in a header file and
could be used by extensions/forks, etc. Adding more information by
changing a return value from void to non-void doesn't have any
negative effect on those potential callers. But taking away a return
value is more likely to affect them in a potentially negative way.
However, I'm significantly changing the signature in this release, so
everybody that used it will have to change their code completely
anyway. Also, I just added a return value for visibilitymap_set() in
the previous release (18). Historically, it returned void. So, I've
gone with your suggestion.
> +static bool
> +identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
> + BlockNumber heap_blk, Page heap_page,
> + int nlpdead_items,
> + Buffer vmbuffer,
> + uint8 vmbits)
> +{
> + Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
> ```
>
> Right before this function is called:
> ```
> old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
> + if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
> + presult.lpdead_items, vmbuffer,
> + old_vmbits))
> ```
>
> So, the Assert() is checking if old_vmbits is newly returned from visibilitymap_get_status(), in that case, identify_and_fix_vm_corruption() can take vmbits as a pointer , and it calls visibilitymap_get_status() to get vmbits itself and returns vmbits via the pointer, so that we don’t need to call visibilitymap_get_status() twice.
I see what you are saying, and I did consider this.
visibilitymap_get_status() is only called the second time in assert
builds, and it isn't so expensive to do it that it is worth worrying
about. I added the assertion to prevent other callers from calling
identify_and_fix_vm_corruption() with random VM bits unassociated with
the vmbuffer passed in.
The reason I don't think identify_and_fix_vm_corruption() should be
the one to call visibilitymap_get_status() and initialize old_vmbits
is that it shouldn't be a required step to setting the VM.
identify_and_fix_vm_corruption()'s job is to identify and fix
corruption -- not get the VM bits for when we set them. In fact, it
may make sense someday to check that the VM and PD_ALL_VISIBLE are in
sync before pruning and freezing is even started. (Of course, we can't
check the number of lpdead items until after).
Regarding having *old_vmbits as a return value. I thought about
directly returning the result of visibilitymap_clear() from
identify_and_fix_vm_corruption(). The reason I didn't is that if
PD_ALL_VISIBLE is set and nlpdead_items > 0 but the VM is clear,
visibilitymap_clear() will return false -- because it didn't need to
clear the VM bits. And I think we want
identify_and_fix_vm_corruption() to return true if it cleared
corruption at all.
I don't think we should have identify_and_fix_vm_corruption() reset
old_vmbits to 0 (and pass it by reference), because the caller may
want to know the value of old_vmbits before we cleared corruption.
> + * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set and
> + * we have attempted to update the VM.
> + */
> + uint8 new_vmbits;
> + uint8 old_vmbits;
> ```
>
> The comment feels a little confusing to me. "HEAP_PAGE_PRUNE_UPDATE_VM option is set” is a clear indication, but how to decide "we have attempted to update the VM”? By reading the code:
> ```
> + prstate->attempt_update_vm =
> + (params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
>
> It’s just the result of HEAP_PAGE_PRUNE_UPDATE_VM being set. So, maybe we don’t the “and” part.
Good point. Fixed.
> +static bool
> +heap_page_will_set_vm(PruneState *prstate,
> + Relation relation,
> + BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
> + Buffer vmbuffer,
> + int nlpdead_items,
> + uint8 *old_vmbits,
> + uint8 *new_vmbits)
> +{
> + if (!prstate->attempt_update_vm)
> + return false;
> ```
>
> old_vmbits and new_vmbits are purely output parameters. So, maybe we should set them to 0 inside this function instead of relying on callers to initialize them.
>
> I think this is a similar case where I raised a comment earlier about initializing presult to {0} in the callers, and you only wanted to set presult in heap_page_prune_and_freeze().
I see your point. It does feel a little bit different to me since they
are local variables and coverity may not actually be able to tell they
are being unconditionally initialized by heap_page_will_set_vm(). The
other local variables that are not initialized at the top are all
unconditionally set by helper return values. But my decision to
initialize them was more instinct than rationality. I've changed it as
you suggested.
> - * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
> - * will return 'all_visible', 'all_frozen' flags to the caller.
> + * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
>
> Nit: a tailing dot is needed in the end of the comment line.
I've changed it. One interesting thing is that our "policy" for
periods in comments is that we don't put periods at the end of
one-line comments and we do put them at the end of mult-line comment
sentences. This is a one-line comment inside a comment block, so I
wasn't sure what to do. If you noticed it, and it bothered you, it's
easy enough to change, though.
> @@ -978,6 +1003,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
> Buffer vmbuffer = params->vmbuffer;
> Page page = BufferGetPage(buffer);
> BlockNumber blockno = BufferGetBlockNumber(buffer);
> + TransactionId vm_conflict_horizon = InvalidTransactionId;
> ```
>
> I guess the variable name “vm_conflict_horizon” comes from the old "presult->vm_conflict_horizon”. But in the new logic, this variable is used more generic, for example Assert(debug_cutoff == vm_conflict_horizon). I see 0006 has renamed to “conflict_xid”, so it’s up to you if or not rename it. But to make the commit self-contained, I’d suggest renaming it.
As of this patch, it is still being exclusively used as the conflict
XID for setting the visibility map. And it still is the visibility
horizon. I rename it to conflict xid once it includes more than just
the visibility horizon for an all-visible page. In that assertion, it
is also the visibility horizon for an all-visible page.
> 9 - 0006
>
> @@ -3537,6 +3537,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
> {
> ItemId itemid;
> HeapTupleData tuple;
> + TransactionId dead_after = InvalidTransactionId;
> ```
>
> This initialization seems to not needed, as HeapTupleSatisfiesVacuumHorizon() will always set a value to it.
I think this is a comment for a later patch in the set (you originally
said it was from 0006), but I've changed dead_after to not be
initialized like this.
> + /*
> + * RT indexes of relations modified by the query either through
> + * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
> + */
> + Bitmapset *es_modified_relids;
> ```
>
> As we intentionally only want indexes, does it make sense to just name the field es_modified_rtindexes to make it more explicit.
I'm torn about this. I named it like this partially because the struct
member two above it in the estate, es_unpruned_relids, is also a
bitmapset of range table indexes and yet is called x_relids. Though
the bitmapset is one of indexes into the range table, they are the
indexes of relation IDs in that range table. I think this could go
either way, so I've left it as is for now and will think more about it
once this patch is closer to being committed.
> + /* If it has a rowmark, the relation is modified */
> + estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
> + rc->rti);
> ```
>
> I think this comment is a little misleading, because SELECT FOR UPDATE/SHARE doesn’t always modify tuples of the relation. If a reader not associating this code with this patch, he may consider the comment is wrong. So, I think we should make the comment more explicit. Maybe rephrase like “If it has a rowmark, the relation may modify or lock heap pages”.
I see what you are saying. It's a good point. However, the reason we
don't want to set the VM for SELECT FOR UPDATE is not because the
SELECT FOR UPDATE will lock the relation but because it is usually
indicating that we intend to modify the relation (when we do the
update). As such, I've updated the comment to say "If it has a
rowmark, the relation may be modified" -- which leaves it more open.
- Melanie
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Heikki Linnakangas | 2025-12-22 18:20:40 | Re: A few patches to clarify snapshot management, part 2 |
| Previous Message | Alexander Borisov | 2025-12-22 17:55:28 | Re: Improve the performance of Unicode Normalization Forms. |