| From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
|---|---|
| To: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
| Cc: | Andres Freund <andres(at)anarazel(dot)de>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Xuneng Zhou <xunengzhou(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
| Subject: | Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) |
| Date: | 2026-03-06 02:40:53 |
| Message-ID: | F5CDD1B5-628C-44A1-9F85-3958C626F6A9@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On Mar 5, 2026, at 16:52, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>
>
>
>> On Mar 4, 2026, at 16:59, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>>
>>
>>
>>> On Mar 3, 2026, at 23:52, Melanie Plageman <melanieplageman(at)gmail(dot)com> wrote:
>>>
>>>
>>>> Otherwise, if prstate->pagefrz.FreezePageConflictXid is still possibly be InvalidTransactionId, then the Assert should be changed to something like:
>>>>
>>>> Assert(prstate->pagefrz.FreezePageConflictXid == InvalidTransactionId ||
>>>> TransactionIdPrecedesOrEquals(prstate->pagefrz.FreezePageConflictXid, prstate->cutoffs->OldestXmin)
>>>
>>> This is covered by TransactionIdPrecedesOrEquals because
>>> InvalidTransactionId is 0. We assume that in many places throughout
>>> the code.
>>>
>>
>> I understood that TransactionIdPrecedesOrEquals(InvalidTransactionId, prstate->cutoffs->OldestXmin) is true, but that would leave an impression to code readers that prstate->pagefrz.FreezePageConflictXid could not be InvalidTransactionId. Thus I think my version explicitly tells that prstate->pagefrz.FreezePageConflictXid could be InvalidTransactionId at the point.
>>
>>
>>>> I will continue with 0005 tomorrow.
>>>
>>
>> 4 - 0005
>> ```
>> * Caller must have pin on the buffer, and must *not* have a lock on it.
>> */
>> void
>> -heap_page_prune_opt(Relation relation, Buffer buffer)
>> +heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
>> ```
>>
>> I don’t see why vmbuffer has to be of pointer type. Buffer type is underlying int, I checked the last commit, vmbuffer only passes in data into the function without passing out anything.
>>
>> As we add the new parameter vmbuffer, though it’s not used in this commit, I think it’d be better to update the header commit to explain what this parameter will do.
>>
>> 5 - 0006
>> ```
>> + *
>> + * heap_fix_vm_corruption() makes changes to the VM and, potentially, the heap
>> + * page, but it does not need to be done in a critical section because
>> + * clearing the VM is not WAL-logged.
>> + */
>> +static void
>> +heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
>> ```
>>
>> Nit: why the last paragraph of the header comments uses the function name instead of “this function”? Looks like a copy-pasto.
>>
>> 6 - 0006
>> ```
>> + if (prstate->lpdead_items > 0)
>> + {
>> + ereport(WARNING,
>> + (errcode(ERRCODE_DATA_CORRUPTED),
>> + errmsg("LP_DEAD item found on page marked as all-visible"),
>> + errdetail("relation \"%s\", page %u, tuple %u",
>> + RelationGetRelationName(prstate->relation),
>> + prstate->block, offnum)));
>> + }
>> + else
>> + {
>> + ereport(WARNING,
>> + (errcode(ERRCODE_DATA_CORRUPTED),
>> + errmsg("tuple not visible to all found on page marked as all-visible"),
>> + errdetail("relation \"%s\", page %u, tuple %u",
>> + RelationGetRelationName(prstate->relation),
>> + prstate->block, offnum)));
>> + }
>> ```
>>
>> I recently just learned that a detail message should use complete sentences, and end each with a period, and capitalize the first word of sentences. See https://www.postgresql.org/docs/current/error-style-guide.html.
>>
>> 7 - 0006
>> ```
>> + else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
>> + {
>> + /*
>> + * As of PostgreSQL 9.2, the visibility map bit should never be set if
>> + * the page-level bit is clear. However, it's possible that the bit
>> + * got cleared after heap_vac_scan_next_block() was called, so we must
>> + * recheck with buffer lock before concluding that the VM is corrupt.
>> + */
>> + ereport(WARNING,
>> + (errcode(ERRCODE_DATA_CORRUPTED),
>> + errmsg("page %u in \"%s\" is not marked all-visible but visibility map bit is set",
>> + prstate->block,
>> + RelationGetRelationName(prstate->relation))));
>> + }
>> ```
>>
>> The comment says “we must recheck with buffer lock before…”, but it only log a warning message. Is the comment stale?
>>
>> 8 - 0007
>> ```
>> +static void
>> +heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
>> +{
>> + OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
>> + Page page = prstate->page;
>> +
>> + Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
>> + (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
>> + !prstate->attempt_freeze));
>> +
>> + /* We'll fill in presult for the caller */
>> + memset(presult, 0, sizeof(PruneFreezeResult));
>> +
>> + /*
>> + * Since the page is all-visible, a count of the normal ItemIds on the
>> + * page should be sufficient for vacuum's live tuple count.
>> + */
>> + for (OffsetNumber off = FirstOffsetNumber;
>> + off <= maxoff;
>> + off = OffsetNumberNext(off))
>> + {
>> + if (ItemIdIsNormal(PageGetItemId(page, off)))
>> + prstate->live_tuples++;
>> + }
>> +
>> + presult->live_tuples = prstate->live_tuples;
>> +
>> + /* Clear any stale prune hint */
>> + if (TransactionIdIsValid(PageGetPruneXid(page)))
>> + {
>> + PageClearPrunable(page);
>> + MarkBufferDirtyHint(prstate->buffer, true);
>> + }
>> +
>> + presult->vmbits = prstate->vmbits;
>> +
>> + if (!PageIsEmpty(page))
>> + presult->hastup = true;
>> +}
>> ```
>>
>> * Given this function has done PageIsEmpty(page), that that is true, we don’t need to count live_tuples, right? That could be a tiny optimization.
>> * I see heap_page_bypass_prune_freeze() is only called in one place and immediately after prune_freeze_setup() and heap_fix_vm_corruption(), so prstate->vmbits must be 0, so do we need to do presult->vmbits = prstate->vmbits;?
>> * Do we need to set all_visible and all_frozen to presult?
>>
>> 0008 LGTM
>>
>> I will continue with 0009 tomorrow.
>>
>
> 9 - 0009
> ···
> + * Currently, only VACUUM performs freezing, but other callers may in the
> + * future. Other callers must initialize prstate.all_frozen to false,
> ···
>
> Nit: prstate.all_frozen -> prstate.set_all_frozen
>
> I saw you have fixed this in 0010, but I think it’s better also fix it here.
>
> 10 - 0010
> ```
> + * Whether or not the page was newly set all-visible and all-frozen during
> + * phase I of vacuuming.
> */
> - uint8 vmbits;
> + BlockNumber new_all_visible_pages;
> + BlockNumber new_all_visible_frozen_pages;
> + BlockNumber new_all_frozen_pages;
> ```
>
> These 3 fields are actually counts rather than pointers to blocks, using type BlockNumber are quite confusing, though underlying BlockNumber is uint32. I think they can be just int type.
>
> 11 - 0010
> ```
> + BlockNumber new_all_visible_pages;
> + BlockNumber new_all_visible_frozen_pages;
> + BlockNumber new_all_frozen_pages;
> ```
>
> I don’t see where these 3 fields are initialized. In lazy_scan_prune(), presult is defined as:
> ```
> PruneFreezeResult presult;
> ```
> So, those fields will hold random values.
>
> 12 - 0010
> ```
> + * conflict would ahve been handled in reaction to the WAL record freezing
> ```
>
> Nit: ahve -> have
>
> 0011 LGTM
>
> 13 - 0012 - bufmask.c
> ```
> + * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
> + * for more details.
> ```
>
> I don’t find a function named heap_xlog_prune_and_freeze().
>
> 14 - 0012 - heapam_xlog.c
> ```
> + * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
> + * heap_xlog_prune_and_freeze()).
> ```
>
> Same as 13.
>
> 0013 LGTM
>
> I will try to finish the rest 5 commits tomorrow.
>
15 - 0014 - execMain.c
```
@@ -3027,6 +3035,7 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
rcestate->es_range_table_size = parentestate->es_range_table_size;
rcestate->es_relations = parentestate->es_relations;
rcestate->es_rowmarks = parentestate->es_rowmarks;
+ rcestate->es_modified_relids = parentestate->es_modified_relids;
```
Here it just assigns the BMS pointer to rcestate->es_modified_relids. I am not sure if further bms_add_member() will still happen, if yes, it might be safer to do bms_copy(parentestate->es_modified_relids), because a further bms_add_member() may cause a new memory allocated and the old pointer stale.
16 - 0014 - execUtils.c
```
for (rti = 1; rti <= estate->es_range_table_size; rti++)
```
Nit: I have seen several recent commits that performed cleanups to switch to use for loop var like:
```
for (Index rti = 1; rti <= estate->es_range_table_size; rti++)
```
17 - 0015
The commit message subject line says “Make begin_scan() functions take a flags argument”, where begin_scan() seems inaccurate, for example, table_index_fetch_begin() is not “begin scan”.
Otherwise 0015 LGTM.
18 - 0016 - tableam.h
```
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* set if the query doesn't modify the relation */
+ SO_HINT_REL_READ_ONLY = 1 << 10,
} ScanOptions;
```
Nit: maybe add an empty line before the new flag.
19 - 0017 - heapam_handler.c
```
@@ -147,7 +147,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
*/
if (prev_buf != hscan->xs_cbuf)
heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
- &hscan->xs_vmbuffer);
+ &hscan->xs_vmbuffer,
+ hscan->modifies_base_rel);
```
This feels like a bug. heap_page_prune_opt takes the first parameter rel_read_only, but hscan->modifies_base_rel means not read-only, so here we should use “!hscan->modifies_base_rel”.
Oh, when I read back your previous email, you have found this bug.
20 - 0018
In heap_insert(), you do:
```
+ if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+ PageSetPrunable(page, xid);
```
But in heap_multi_insert(), you do:
```
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
```
Is the option check " !(options & HEAP_INSERT_FROZEN))” also needed by heap_multi_insert?
~~ Done of this round review ~~
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Amit Kapila | 2026-03-06 03:23:30 | Re: Skipping schema changes in publication |
| Previous Message | jian he | 2026-03-06 02:17:35 | Re: UPDATE run check constraints for affected columns only |