| From: | "Greg Burd" <greg(at)burd(dot)me> |
|---|---|
| To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Tepid: selective index updates for heap relations |
| Date: | 2026-06-30 17:21:07 |
| Message-ID: | af02b486-6adc-494c-a357-1dd72f655dcf@app.fastmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hello Hackers,
This is a set of patches that extend the heap-only tuple (HOT) update optimization model such that only those indexes directly impacted by the update need be updated. The motivation is simple, reduce "bloat" (read: reduce VACUUM overhead) and speed up heap updates by avoiding unnecessary index updates. I call this "tepid" because it is decidedly not Heap-only (HOT), nor is it "WARM", nor is "partially HOT (PHOT)"... it's "Tepid" :)
This is a long one. Thank you in advance for any time you dedicate to this, I appreciate your help and look forward to working together to finish this feature.
Overview
========
HOT keeps an UPDATE off the indexes only when no indexed column changes. Any
update that touches an indexed column today becomes a non-HOT update: a new
heap tuple (often on a new page) plus a fresh entry in *every* index, with the
attendant WAL, bloat, and index write amplification.
Selective Index Updates (SIU, internally "HOT-indexed") lets such an update
stay a heap-only tuple on the same page and insert a fresh entry only into the
indexes whose attributes actually changed. The pre-update index entries are
left in place and become potentially *stale*: an entry for an old key still
chain-leads to the live tuple, whose current key differs.
The new tuple records, inline in its own tail, a small bitmap of which indexed
attributes changed at its hop. A scan that walks the chain to the live tuple
unions the bitmaps of the hops it crossed; if that union overlaps the arriving
index's key columns, the entry is stale and is dropped, and the row is
re-supplied by the fresh entry the same update planted. The crossed-attribute
bitmap -- not a value recheck -- is the staleness authority.
This is, deliberately, the same family of idea as WARM and PHOT.
Performance
===========
A/B run of two release (cassert=off) builds -- origin/master vs the SIU
series -- on a single Apple Silicon laptop (macOS), pgbench, scale 5
(siu_table = 500k rows with 3 secondary indexes; wide_table = 5k rows with 16
secondary indexes + PK), 8 clients / 4 threads, 20 s per cell. pgbench runs
for a fixed time, so each variant completes a different number of updates; the
write-amplification signal is therefore reported as WAL bytes per update.
workload (indexed cols changed) TPS master->tepid WAL/update master->tepid
--------------------------------- -------------------- ------------------------
simple_update (control; HOT both) 32.6k -> 32.1k ( 0%) 265 -> 265 B ( 0%)
hot_indexed_update (1 of 4) 58.2k -> 68.6k (+18%) 636 -> 487 B (-23%)
wide, 1 of 17 indexes 33.6k -> 41.7k (+24%) 1466 -> 598 B (-59%)
wide, 8 of 17 indexes 37.4k -> 47.3k (+26%) 1498 -> 1015 B (-32%)
wide, 16 of 17 indexes 36.3k -> 37.3k ( +3%) 1530 -> 1490 B ( -3%)
read_indexscan (read-only) 164.4k ->161.3k ( -2%) n/a (no writes)
Design overview
==================
On-disk marker. A HOT-indexed update makes the new version a heap-only tuple
(HEAP_ONLY_TUPLE) and additionally sets the new infomask2 bit
HEAP_INDEXED_UPDATED (0x0800, previously free). Appended after the tuple's
attribute data, in the final ceil(natts/8) bytes of its line-pointer item, is a
fixed-size bitmap of the indexed attributes that changed at this hop (relative
to the prior chain member). Tuple deforming stops at natts and never sees it.
The bitmap is sized by the tuple's OWN attribute count at write time
(HeapTupleHeaderGetNatts), not the relation's current natts: ADD COLUMN raises
the relation's natts without rewriting existing tuples, so a chain can hold
hops whose bitmaps were sized for different (smaller) natts, and every consumer
locates a hop's bitmap from that hop's own write-time natts. The bitmap is
inline in the data-bearing tuple -- there is no separate "tombstone" line
pointer.
Write path. heap_update gains a third mode (HEAP_SELECTIVE_INDEX_UPDATE)
alongside HEAP_UPDATE_ALL_INDEXES and HEAP_HEAP_ONLY_UPDATE. It keeps the
new tuple heap-only with its inline bitmap, and the executor inserts fresh
entries only into the indexes whose attributes changed (ExecSetIndexUnchanged);
the fresh entry points at the new heap-only tuple, not at the chain root.
If the page can't fit the new tuple, the update is downgraded to a normal
non-HOT update.
Read path. heap_hot_search_buffer walks the chain to the live tuple and unions
the per-hop modified-attrs bitmaps of every hop crossed *after* the arriving
entry's own tuple (its own producing hop does not count -- a fresh entry is
never stale for its own index). The index-access layer tests that union
against the arriving index's key columns: overlap => stale, drop; disjoint =>
current, return. No value comparison and no leaf key are needed, so scans
never have to materialize the leaf IndexTuple for staleness purposes, and the
mechanism is identical for every access method.
Prune / collapse. A dead mid-chain HOT-indexed tuple cannot be reclaimed to
LP_UNUSED while stale btree entries still point at its LP, and its bitmap is
what later readers union. prune collapses a dead prefix: each preserved dead
key tuple is rewritten in place as an xid-free "stub" (LP_NORMAL,
HEAP_INDEXED_UPDATED, natts == 0, frozen, t_ctid.offnum forwarding to the next
survivor, carrying the same inline bitmap), and a dead member whose attributes
are wholly subsumed by later hops is reclaimed instead. The root is redirected
to the first survivor. VACUUM's index cleanup sweeps the stale leaves, then a
later prune reclaims the stubs and re-points the redirect, collapsing back to
classic HOT. The collapse rides the existing prune/freeze WAL; logical
decoding sees an ordinary UPDATE.
On-disk format commitment
=========================
This is a permanent format addition and we want explicit agreement before
freezing it:
- One infomask2 bit (0x0800). pg_upgrade is unaffected (clusters predating
SIU have no such items); a pg_upgrade test carries chains, an ABA-cycled
column, a TOASTed indexed column, and collapsed stubs across an upgrade.
- A new interpretation of an LP_NORMAL item:
* a data-bearing HOT-indexed tuple (HEAP_INDEXED_UPDATED, natts >= 1)
with a trailing modified-attrs bitmap sized by the tuple's own natts;
and
* an xid-free collapse-survivor stub (HEAP_INDEXED_UPDATED, natts == 0) --
a signature no real tuple can produce. Because the stub overwrites
natts with the 0 sentinel, it preserves its write-time natts (needed to
size/locate the bitmap) in the otherwise-unused block-number half of
t_ctid; the offset half holds the forward link.
Every consumer of LP_NORMAL heap items must tolerate both. We have audited
the in-tree consumers; visibility-gated paths (seqscan, bitmap, ANALYZE,
index build) are inherently safe because stubs are XMIN_INVALID and the
trailing bitmap is past natts; amcheck and VACUUM/prune are stub-aware;
pg_surgery skips stubs (forcing a freeze/kill on one would corrupt the
heap); pageinspect and pgstattuple are read-only and merely imprecise.
- The bitmap is sized per-tuple by write-time natts, so ADD COLUMN over a
relation with live chains is safe even when it crosses an 8-attribute
boundary (which changes ceil(natts/8)); a regression test exercises
CREATE INDEX / DROP INDEX / ADD COLUMN (boundary-crossing) / DROP COLUMN
after a chain exists and reads back through it.
Alternatives considered and rejected: a separate relation fork (heavy, and the
marker must be co-located with the tuple for the chain walk); a separate
adjacent "tombstone" LP per hop (doubles line-pointer pressure; the inline
trailing bitmap needs no extra item); "redirect-with-data" LP_REDIRECT carrying
the bitmap (LP_REDIRECT has no storage for a payload); a new line-pointer flavor
(consumes scarce lp_flags space and touches far more code than reusing
LP_NORMAL + an infomask2 bit).
Eligibility
===========
A non-summarizing indexed attribute changing -- under any access method --
yields HEAP_SELECTIVE_INDEX_UPDATE unless a carve-out (6.1) applies. The cases
that DO work and are covered by tests are in 6.2; the distinction is deliberate,
because several restrictions an earlier (value-recheck) draft needed turned out
to be unnecessary once the crossed-attribute bitmap became the staleness
authority.
Carve-outs (deliberately conservative)
--------------------------------------
- System catalogs. A catalog UPDATE that changes a non-summarizing indexed
attribute stays classic HOT but never takes the HOT-indexed path: catalogs
are reached through access paths (systable scans, SnapshotDirty unique
checks, seqscans) we have not proven safe.
- Expression indexes: an UPDATE that changes an attribute an expression index
references. The bitmap is attribute-granular and cannot tell whether the
expression's VALUE changed; expression-aware selective maintenance is not
wired up. (This restriction may be liftable the same way the partial-index
one was -- see 6.2 -- but is kept until tested.)
- Every indexed attribute changed. Nothing can be skipped, so a plain
non-HOT update is cheaper (it avoids the chain-walk and bitmap overhead).
"Every" is an exact test; there is no percentage GUC.
- The logical-replication apply path, gated per subscription by
hot_indexed_on_apply (off / subset_only (default) / always): a HOT-indexed
update of a replica-identity attribute leaves a stale leaf the apply
worker's RI lookup must tolerate, which it does only when the indexed
attributes are a subset of the primary key.
State model: classic HOT and the HOT/SIU state changes
=========================================================
This section catalogs every relevant state and traces the outcomes, because the
correctness argument is entirely about which states a chain and its index
entries pass through. Notation: LP[n] is the line pointer at offset n; "{a,b}"
is a modified-attrs bitmap; "->" in t_ctid is the same-page successor offset.
8.1 Line-pointer (ItemId) states
--------------------------------
LP_UNUSED Free slot, no storage.
LP_NORMAL Points to an item (lp_off, lp_len). In SIU this item is one of:
- a real tuple (classic);
- a HOT-indexed tuple (HEAP_INDEXED_UPDATED, natts >= 1, with
a trailing bitmap); or
- a collapse-survivor stub (HEAP_INDEXED_UPDATED, natts == 0,
xid-free, forwarding via t_ctid.offnum). [NEW in SIU]
LP_REDIRECT Points to another offset; no tuple. Created by prune when a
chain root dies but heap-only members remain. (Unchanged by
SIU, but now there may be more than one redirect forwarding to
the same live tuple after a collapse.)
LP_DEAD Dead, reclaimable, no storage.
Tuple flag states (t_infomask2) and chain roles
------------------------------------------------
HEAP_HOT_UPDATED This tuple was HOT-updated; its t_ctid successor is a
heap-only tuple on the same page. (classic)
HEAP_ONLY_TUPLE No index entry points *directly* at the chain root for
this tuple's sake; it is reached by walking t_ctid.
(classic)
HEAP_INDEXED_UPDATED [NEW] This heap-only tuple belongs to a HOT-indexed
chain and carries an inline trailing modified-attrs
bitmap of the attributes that changed at this hop (the
bitmap is empty for a classic-HOT update promoted to keep
a HOT-indexed chain uniform). With natts == 0 the same
bit marks a collapse-survivor stub.
Root tuple First tuple in the chain; the tuple classic index entries
point at. Not heap-only.
Heap-only tuple A chain member reached via t_ctid. Under SIU a heap-only
tuple may ALSO be pointed at directly by a fresh
HOT-indexed index entry (this is the key departure from
classic HOT, where only the root is pointed at).
Index-entry states
------------------
Fresh entry Points at the heap-only tuple version whose indexed key it
matched at insertion. Its walk to the live tuple crosses no
later hop that changed its index's key, so the crossed union is
disjoint from its key columns: kept.
Stale entry A pre-update entry whose key the live tuple no longer holds (or
holds again only by coincidence after a cycle). Its walk
crosses a hop that changed its index's key: the union overlaps,
so it is dropped. The live row is re-supplied by the fresh
entry.
Read-side transient state (per scan, not on disk) [NEW]
-------------------------------------------------------
xs_hot_indexed_recheck The chain walk crossed a HOT-indexed hop after the
arriving entry's own tuple.
xs_hot_indexed_crossed The union of those crossed hops' modified-attrs
bitmaps (complete -- see Section 7).
xs_hot_indexed_stale Verdict: xs_hot_indexed_crossed overlaps the
arriving index's key columns. The executor (and
CLUSTER, IOS, and the apply RI lookups) drop the
tuple.
State transitions
-----------------
INSERT LP_NORMAL root tuple, not heap-only. One entry per index.
Classic HOT UPDATE (no indexed col changed) old tuple: +HEAP_HOT_UPDATED,
t_ctid -> new; new tuple: HEAP_ONLY_TUPLE. No new index
entries.
HOT-indexed UPDATE (some, not all, indexed cols changed; eligible) old
tuple: +HEAP_HOT_UPDATED, t_ctid -> new; new tuple:
HEAP_ONLY_TUPLE + HEAP_INDEXED_UPDATED + inline bitmap of
the changed attrs; fresh entries inserted only into the
changed indexes, each pointing at the new tuple;
unchanged indexes are untouched (their existing entries
still resolve through the chain).
Non-HOT UPDATE (ineligible, or page full) new tuple on a (possibly new)
page; a fresh entry in *every* index.
Prune/collapse dead prefix members -> reclaimed (bitmap subset of later
hops) or rewritten to xid-free stubs (forwarding,
bitmap-preserving); root -> LP_REDIRECT to first survivor.
VACUUM ambulkdelete sweeps stale leaves; a later pass reclaims
stubs and re-points the redirect -> classic HOT.
Worked example 1 -- selective maintenance and a stale drop
-----------------------------------------------------------
t(id PK, a, b, c), indexes t_a(a), t_b(b), t_c(c), fillfactor 50.
INSERT (1,10,20,30); UPDATE a=11; UPDATE b=21; UPDATE c=31.
Chain (the bitmap on a tuple = attrs changed on the hop INTO it):
LP[1] v1(a=10,b=20,c=30) root, HEAP_HOT_UPDATED, ->2 dead
LP[2] v2(a=11,b=20,c=30) heap-only, INDEXED_UPDATED{a}, ->3 dead
LP[3] v3(a=11,b=21,c=30) heap-only, INDEXED_UPDATED{b}, ->4 dead
LP[4] v4(a=11,b=21,c=31) heap-only, INDEXED_UPDATED{c} live
Index entries (fresh point mid-chain at the tuple they matched):
t_a: (10)->LP[1] stale (11)->LP[2] fresh
t_b: (20)->LP[1] stale (21)->LP[3] fresh
t_c: (30)->LP[1] stale (31)->LP[4] fresh
Scan a=11 via t_a -> LP[2]:
arrive AT LP[2] (own hop {a} not counted); cross ->3 {b}, ->4 {c}.
crossed = {b,c}; t_a keys = {a}; {a} & {b,c} = {} => fresh => return v4. OK
Scan a=10 via t_a -> LP[1] (stale):
cross ->2 {a}, ->3 {b}, ->4 {c}.
crossed = {a,b,c}; {a} & {a,b,c} = {a} => stale => drop. OK
(v4 is supplied once, by the fresh (11)->LP[2] entry.)
Scan b=21 via t_b -> LP[3]: crossed ->4 {c}; {b} & {c} = {} => return v4. OK
Worked example 2 -- ABA (the case a value recheck gets wrong)
-------------------------------------------------------------
INSERT (1,10,...); UPDATE a=11; UPDATE a=10. (a cycles 10 -> 11 -> 10)
LP[1] v1(a=10) root ->2 dead
LP[2] v2(a=11) {a} ->3 dead
LP[3] v3(a=10) {a} live
t_a: (10)->LP[1] stale (11)->LP[2] stale (10)->LP[3] fresh
Scan a=10 finds TWO entries with key 10 (LP[1] and LP[3]):
via LP[3]: zero hops crossed => fresh => return v3. OK
via LP[1]: cross ->2 {a}, ->3 {a}; crossed={a}; {a}&{a}={a} => drop. OK
Returned exactly once. A value recheck would compare leaf key 10 against
live a=10 for BOTH entries and keep both -> duplicate. The bitmap drops the
ancestor because a *changed* after LP[1], regardless of the coincident value.
Worked example 3 -- REINDEX over the chain
------------------------------------------
REINDEX t_a after example 2 rebuilds one entry, pointing at the live tuple
mid-chain: (10)->LP[3]. Zero hops crossed => fresh => returned once. OK
(The rebuild points at the live member, not the root, so it is never seen as
stale -- this is required for "drop on overlap" to be safe; a build that
pointed at the root carrying the live value would be wrongly dropped.)
Worked example 4 -- collapse to xid-free stubs
-----------------------------------------------
From example 1, VACUUM finds LP[1..3] dead, LP[4] live. Walking the dead
prefix from the live end, accumulating the union of later hops (laterattrs):
seed laterattrs from the live remainder LP[4]: {c}.
LP[3] {b}: {b} not-subset {c} -> still has a live fresh entry (21)->LP[3]; keep as
stub forwarding ->4. laterattrs |= {b} => {b,c}.
LP[2] {a}: {a} not-subset {b,c} -> keep as stub forwarding ->3. laterattrs => {a,b,c}.
LP[1] root -> LP_REDIRECT ->2 (first survivor).
Result:
LP[1] redirect ->2
LP[2] stub{a} forward ->3
LP[3] stub{b} forward ->4
LP[4] live v4
Scan a=11 via t_a (11)->LP[2]:
arrive AT LP[2] stub (own segment {a} not counted); forward ->3 stub {b},
->4 {c}; crossed={b,c}; {a}&{b,c}={} => fresh => return v4. OK
Scan a=10 via t_a (10)->LP[1] redirect ->2:
follow redirect to LP[2] (now a crossed segment) {a}, ->3 {b}, ->4 {c};
crossed={a,b,c}; {a}&{a,b,c}={a} => stale => drop. OK
Had a dead member's attributes been fully subsumed by later hops (e.g. a
second a-changing hop after LP[2]), LP[2] would be reclaimed (LP_DEAD) rather
than stubbed: its entries are already superseded, so no live entry references
it, and its {a} is still carried by the later survivor the reader crosses.
Once every entry pointing into the chain is swept by ambulkdelete and the
whole chain is dead, VACUUM reclaims the stubs to LP_UNUSED and re-points the
root redirect straight at the live tuple -- the page is back to classic HOT.
Worked example 4a -- prune and vacuum, step by step
----------------------------------------------------
A fuller trace of the same chain, separating what PRUNE does (the collapse)
from what VACUUM does (the index sweep and final reclaim). Note there is no
"redirect-with-data": the root becomes a plain LP_REDIRECT and the per-hop
bitmaps live on the stubs, which the reader crosses one by one.
Table siu_collapse(id, a, b, c), indexes siu_coll_a(a), siu_coll_b(b),
siu_coll_c(c):
INSERT (1,10,20,30); UPDATE a=11; UPDATE b=21; UPDATE c=31;
(0) Chain after the three HOT-indexed updates, before any prune. Each new
version is a heap-only tuple carrying the bitmap of what changed at its
hop; each changed index got a fresh entry at the new tuple's own TID,
and the pre-update entries remain (now stale).
LP[1] v1(a=10,b=20,c=30) root, HEAP_HOT_UPDATED, ->2 dead
LP[2] v2(a=11,b=20,c=30) heap-only, {a}, ->3 dead
LP[3] v3(a=11,b=21,c=30) heap-only, {b}, ->4 dead
LP[4] v4(a=11,b=21,c=31) heap-only, {c} live
siu_coll_a: (10)->LP[1] stale (11)->LP[2] fresh
siu_coll_b: (20)->LP[1] stale (21)->LP[3] fresh
siu_coll_c: (30)->LP[1] stale (31)->LP[4] fresh
(1) PRUNE (on-access heap_page_prune_opt, or VACUUM's first pass) finds
LP[1..3] dead and LP[4] live, and collapses the dead prefix. Walking
from the live end, accumulating laterattrs (the union of later hops):
seed laterattrs = LP[4] {c}
LP[3] {b}: {b} not subset of {c} -> keep as stub ->4; laterattrs={b,c}
LP[2] {a}: {a} not subset of {b,c} -> keep as stub ->3; laterattrs={a,b,c}
LP[1] root -> LP_REDIRECT ->2 (first survivor)
A dead member is reclaimed outright (LP_DEAD) instead of stubbed only
when its bitmap is a subset of the later hops -- then no live entry
references it and a later survivor still carries its attributes. Here
none qualify, so all three are kept. Result:
LP[1] redirect ->2
LP[2] stub {a} forward ->3 (xid-free: XMIN/XMAX_INVALID, natts==0)
LP[3] stub {b} forward ->4 (xid-free)
LP[4] live v4
The page is kept non-all-visible while a stub remains, so index-only
scans heap-fetch through it. The stale leaves (10/20/30 ->LP[1]) and
the fresh leaves still point where they did; only the heap changed.
(2) Reads against the collapsed page:
Query a=11 via siu_coll_a, fresh entry (11)->LP[2]:
arrive AT LP[2] stub (its own {a} is the entry's own hop, not counted);
cross ->3 {b}, ->4 {c}; crossed={b,c}; siu_coll_a key {a};
{a} & {b,c} = {} => current => return v4. OK
Query b=21 via siu_coll_b, fresh entry (21)->LP[3]:
arrive AT LP[3] stub; cross ->4 {c}; crossed={c}; {b}&{c}={}
=> current => return v4. OK
Query a=10 via siu_coll_a, STALE entry (10)->LP[1]:
LP[1] is a plain redirect -> follow to LP[2]; now crossing the
collapsed segment: LP[2] {a}, ->3 {b}, ->4 {c}; crossed={a,b,c};
{a} & {a,b,c} = {a} => stale => drop (v4 is supplied once, by the
fresh (11)->LP[2] entry). OK
(3) VACUUM index cleanup (ambulkdelete) removes the now-removable stale
leaves (10/20/30 ->LP[1]); kill_prior_tuple / bottom-up deletion also
remove them opportunistically. VACUUM's heap second pass
(lazy_vacuum_heap_page) does NOT collapse or re-point anything; it only
turns LP_DEAD line pointers into LP_UNUSED.
(4) Final reclaim. Once every entry into the chain has been swept and the
whole chain is dead, a later PRUNE reclaims the stubs to LP_UNUSED and
re-points the root redirect straight at the live tuple:
LP[1] redirect ->4 (or reclaimed if no entry references the root)
LP[2] LP_UNUSED
LP[3] LP_UNUSED
LP[4] live v4
No SIU metadata remains on the page; it is indistinguishable from a
classic-HOT chain that has been pruned.
Worked example 5 -- ADD COLUMN across a bitmap-size boundary
-------------------------------------------------------------
The bitmap is ceil(natts/8) bytes, sized by the tuple's natts AT WRITE TIME.
ADD COLUMN raises the relation's natts but does not rewrite existing tuples,
so a chain can hold hops sized for different natts. The sharp case is
crossing an 8-attribute boundary, where ceil(natts/8) grows by a byte; a
reader that sized the bitmap from the relation's *current* natts would read
the wrong trailing bytes. Every consumer instead uses the hop's own
write-time natts (HotIndexedTupleBitmapNatts: HeapTupleHeaderGetNatts for a
live tuple, the stub's stashed natts otherwise).
t(c1 PK, c2, ..., c7, payload), exactly 8 attrs; indexes t_c2(c2), t_c7(c7).
INSERT (...,c7=70,...); UPDATE c7=71; UPDATE c7=72.
LP[1] v1(c7=70) root ->2 dead
LP[2] v2(c7=71) {c7} ->3 dead bitmap 1 byte (natts=8)
LP[3] v3(c7=72) {c7} live bitmap 1 byte (natts=8)
Now ALTER TABLE t ADD COLUMN c9 int; -- relation natts 8 -> 9; ceil 1 -> 2
A subsequent UPDATE c7=73 appends a hop sized for natts=9 (2 bytes):
LP[1] v1(c7=70) root ->2 dead (1-byte bitmap)
LP[2] v2(c7=71) {c7} ->3 dead (1-byte bitmap)
LP[3] v3(c7=72) {c7} ->4 dead (1-byte bitmap)
LP[4] v4(c7=73) {c7} live (2-byte bitmap)
Scan c2=<unchanged> via t_c2 -> LP[1] (stale):
cross ->2 {c7}, ->3 {c7}, ->4 {c7}; each located by its own write-time
natts (1 byte for LP[2,3], 2 bytes for LP[4]) and OR-ed into the
relnatts-sized accumulator. crossed={c7}; {c2} & {c7} = {} => the c2
entry is current => return v4. OK
(Sizing the LP[2,3] bitmaps with the relation's current natts=9 would read
one byte of attribute data as bitmap and could spuriously set a bit,
wrongly dropping the current c2 entry -- which the per-hop sizing avoids.)
Scan c7=72 via t_c7 -> LP[3] (now stale): cross ->4 {c7}; {c7}&{c7}={c7}
=> drop. c7=73 via the fresh (73)->LP[4] entry => return v4. OK
Collapse preserves this: a stub records its write-time natts in the unused
block half of t_ctid (the offset half is the forward link), so a stubbed
1-byte hop and a live 2-byte hop coexist in one collapsed chain and each is
read at its own size. DROP COLUMN keeps the attnum slot (no renumber), so
bit positions and natts are unchanged and existing bitmaps stay aligned.
Open questions for the list
===========================
(a) Is the crossed-attribute-bitmap staleness model acceptable in principle?
It adds a per-hop on-disk bitmap and a chain-walk union to the read path,
and weakens the "an index entry accurately reflects the indexed value"
contract.
(b) Is the on-disk format (Section 5) acceptable?
Fin
===
I hope (some of) you made it this far. :)
I'd appreciate feedback or review of the code and/or approach. I'm sure (I hope!) there will be debate and constructive feedback. This patch start with the ideas from another thread [1] and may eventually end up addressing that thread's specific goal (expanding HOT for expression indexes), but does not do that yet. For those inclined, there's also a wiki page [2] where I hope to fully capture this idea for posterity.
best.
-greg
[1] https://commitfest.postgresql.org/patch/5556/
[2] https://wiki.postgresql.org/wiki/Heap_HOT_Selective_Index_Updates
| Attachment | Content-Type | Size |
|---|---|---|
| v48-0001-Add-tests-to-cover-a-variety-of-heap-HOT-update-.patch | text/x-patch | 45.4 KB |
| v48-0002-Identify-modified-indexed-attributes-in-the-exec.patch | text/x-patch | 62.2 KB |
| v48-0003-Add-the-HOT-indexed-on-disk-format-inline-attr-b.patch | text/x-patch | 12.8 KB |
| v48-0004-Add-HOT-indexed-updates-selective-index-maintena.patch | text/x-patch | 228.7 KB |
| v48-0005-Collapse-dead-HOT-indexed-chains-to-xid-free-stu.patch | text/x-patch | 53.2 KB |
| v48-0006-Teach-amcheck-to-recognize-HOT-indexed-chains-an.patch | text/x-patch | 17.7 KB |
| v48-0007-Add-HOT-indexed-statistics-and-the-comprehensive.patch | text/x-patch | 160.2 KB |
| v48-0008-Gate-HOT-indexed-updates-on-the-logical-replicat.patch | text/x-patch | 115.7 KB |
| v48-0009-DO-NOT-MERGE-Add-a-HOT-SIU-benchmark-harness.patch | text/x-patch | 33.4 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Noah Misch | 2026-06-30 17:30:53 | Wrong query result w/ propgraph single lateral col reference |
| Previous Message | 신성준 | 2026-06-30 17:02:19 | Re: Add wait events for server logging destination writes |