Quick Links

Re: Expanding HOT updates for expression and partial indexes

From:	"Greg Burd" <greg(at)burd(dot)me>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Expanding HOT updates for expression and partial indexes
Date:	2026-02-10 15:09:36
Message-ID:	97769442-2ed7-4fe0-a393-7e2b5ff7ff58@app.fastmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello,

TL;DR, I'm going to put a pin in this idea for now.

I'll net out where this patch set is quickly and leave the majority of the
detail for anyone curious in an addendum below.

The patch set enables HOT updates a few new cases:
* when expressions on indexes remain constant
* when updates fall outside partial index predicates
* when indexes report that their keys are unchanged by the update

The benefits are:
* more HOT updates
* reduced index bloat
* modified index attributes identified ahead of table AM update

The drawbacks are:
* the overhead of doing net-new work determining modified attributes
* higher concurrency causing retries, compounding that work
* index and predicate expressions executed more frequently

Performance:
------------

The good,
* HOT improvements: 95-100%
* Bloat reduction: 50-98%
* Update 100 rows at once: +27% throughput, -98% bloat

the bad,
* -2% to -6% throughput regression on updates across the board

and the ugly.
- net new work finding modified indexed attributes
- net new retries on update due to added concurrency (TU_Updated)
- retries then must re-identify modified indexed attributes

While I still believe in expanding HOT updates and that moving the logic for
finding the set of modified index attributes into the executor was a good idea,
it is late in the v19 cycle and at this stage the trade-offs feel unacceptable.

So I'm going to withdraw this patch and pause here on development and reconsider
my approach in the v20 cycle. I have some ideas on how to avoid the need to
evaluate these expressions entirely and I still have a fondness for the idea of
only updating those indexes that changed (PHOT/WARM/whatever) on update, maybe
I'll combine the ideas.

best.

-greg

ADDENDUM:

ExecUpdate (nodeModifyTable.c)
-> [PATCHED] after the l2: label, ExecCheckIndexedAttrsForChanges()
-> [UNPATCHED] (nothing at this point...)
-> table_tuple_update()
-> heap_update()
-> LockBuffer(BUFFER_LOCK_EXCLUSIVE)
-> [UNPATCHED] HeapDetermineColumnsInfo() // uses memcmp() for datum comparison
-> returns TM_Updated immediately, frees modified_attrs
-> EvalPlanQual() // recheck with updated tuple
-> ExecUpdate (retry)
-> [PATCHED] ExecCheckIndexedAttrsForChanges() // second evaluation
-> [UNPATCHED] (nothing at this point...)
-> table_tuple_update()
-> heap_update()
-> LockBuffer(BUFFER_LOCK_EXCLUSIVE)
-> HeapDetermineColumnsInfo() // second evaluation

As illustrated above, the patches move indexed attribute checking from
HeapDetermineColumnsInfo() (called once in heap_update under buffer lock
BUFFER_LOCK_EXCLUSIVE) to ExecCheckIndexedAttrsForChanges() (called in the
executor before calling table_tuple_update() when the buffer with the old tuple
is pinned, but not holding locked exclusively). In the unpatched code, when
heap_update() returns TM_Updated due to a concurrent update, it immediately
returns to the executor without retrying - the modified_attrs bitmapset is freed
and the executor performs an EPQ (EvalPlanQual) recheck before calling
table_tuple_update() again. On this second call, the unpatched code re-executes
HeapDetermineColumnsInfo() with the new tuple because the modified_attrs may
have changed. The patched code similarly calls ExecCheckIndexedAttrsForChanges()
again, as it will have jumped to the l2: label, before retrying
table_tuple_update().

Both versions re-evaluate to find the modified attributes bitmap on concurrent
conflicts, the key differences are:

1) the patched version performs richer comparisons (partial index predicates,
index AM amcomparedatums(), expression evaluation) while the unpatched
version only does binary datum comparison using memcmp() and,

2) under high contention running outside the buffer lock has increased the
probability TM_Updated because he window for concurrent modifications has
increased.

It may be that expanding the window for concurrency (2) during updates is a
mistake as it is impossible for the heap to update two different tuples on the
same page concurrently so this has no potential benefit and compounds the
expense.

However, it is entirely possible that other table AMs could have different
concurrency models and so in theory we should allow for that. Consider how some
LSMs work, concurrent updates are just appends to the current log file. Maybe
the table AM should indicate a need for a buffer lock or not during updates?

PATCHES:

0001: Refactors heapam_tuple_update()/heap_update() by removing some code in the
beginning of heap_update() into both heapam_tuple_update() and
simple_heap_update() as a preface for removing the need for
HeapDetermineColumnsInfo() when calling heapam_tuple_update().

0002: Moves the detection of modified indexed columns from within heap into the
executor in a new function ExecWhichIndexesRequireUpdates() that replaces
HeapDetermineColumnsInfo().

This shift now detects the set of modified indexed columns earlier in the
query execution process (just after the l2: label in nodeModifyTable.c)
and crucially outside of an exclusive buffer page lock as was the case
before in heap_update().

ExecCheckIndexedAttrsForChanges() returns a Bitmapset, identical to
HeapDetermineColumsInfo(), of the modified indexed columns. While similar
in spirit to HeapDetermineColumsInfo(), this new function also evaluates
expressions, and potentially calls into a new index AM API
amcomparedatums(). This is the key change that enables HOT updates when
expressions are unchanged on update.

The new optional index AM function amcomparedatums() allows index
implementations to define their own custom logic for determining if a
datum has changed and if that change requires that the index be updated
or not. This has been implemented for GIN and HASH.

The patch also updates the replication logic in execReplication() to
correctly provide the set of modified indexed attributes extracted from
the replicated tuples to avoid overhead of rediscovery later on.

Catalog tuple updates are unchanged and continue to use
simple_heap_update() which in turn calls HeapDetermineColumsInfo(). Note
that I proposed a separate patch set to remove HeapDetermineColumsInfo()
from simple_heap_update() by changing the way catalog tuples are updated,
but that has since been abandoned. See cf-6221 for more details on this.

0003: Removes the now-redundant index_unchanged_by_update() function. Its
functionality is superseded by simply reusing the Bitmapset created by
ExecWhichIndexesRequireUpdates() and stored in ri_ChangedIndexedCols.

0004: Moves the evaluation of partial index predicates from after the table
update to before it, within the ExecWhichIndexesRequireUpdates(). If both
the old and new tuples are outside the index's predicate, the index is not
considered impacted, and so the attributes in the predicate are not
recorded in the resulting Bitmapset potentially allowing for a HOT update.

PERFORMANCE TEST RESULTS:

Platform: FreeBSD 15 (Intel NUC)
Duration: 600 seconds (10 minutes) per-test, ~3.5 hours total
Baseline: 8ebdf41c262 (origin/master)
Patched: bdfe58c4537 (origin/master + 4 patches)

HOT Update Percentages:
────────────────────────────────────────────────────────────────
Test Baseline Patched Δ
────────────────────────────────────────────────────────────────
JSONB BTREE expression index 0.0% 95.0% 95.0%
JSONB GIN expression index 0.0% 98.1% 98.1%
Partial index 0.0% 96.3% 96.3%
3 expression indexes 0.0% 98.2% 98.2%
6 expression indexes 0.0% 100.0% 100.0%
Array expression index 0.0% 97.8% 97.8%
UPDATE 100 rows 0.0% 99.7% 99.7%
Control: indexed field changes 47.7% 48.2% .5%
Control: expensive expression 0.0% 0.0% 0%
Control: Induced TU_Updated 0.1% 0.1% 0%

Throughput (10 clients):
────────────────────────────────────────────────────────────────
Test Baseline Patched Δ
────────────────────────────────────────────────────────────────
JSONB BTREE expression index 563.6 543.2 -3.6%
JSONB GIN expression index 556.5 536.6 -3.5%
Partial index 415.6 406.4 -2.2%
3 expression indexes 553.6 535.1 -3.3%
6 expression indexes 528.9 511.8 -3.2%
Array expression index 560.7 542.6 -3.2%
UPDATE 100 rows 840.9 1066.7 26.8%
Control: indexed field changes 564.6 529.0 -6.3%
Control: expensive expression 563.4 527.3 -6.4%
Control: Induced TU_Updated 1677.5 1730.4 3.1%

Index Bloat After Updates (MB):
────────────────────────────────────────────────────────────────
Test Baseline Patched Reduction
────────────────────────────────────────────────────────────────
JSONB BTREE expression index 8.2 2.9 64.6%
JSONB GIN expression index 8.8 6.3 27.3%
Partial index 4.3 2.1 49.8%
3 expression indexes 13.4 4.1 68.7%
6 expression indexes 26.7 5.9 77.7%
Array expression index 12.8 4.3 66.5%
UPDATE 100 rows 453.5 6.9 98.4%
Control: indexed field changes 4.6 4.5 .6%
Control: expensive expression 27.2 25.9 4.8%
Control: Induced TU_Updated 7.9 7.9 0%

Attachment	Content-Type	Size
perf-cf5556	application/octet-stream	13.3 KB
v29-0001-Prepare-heapam_tuple_update-and-simple_heap_upda.patch	text/x-patch	47.8 KB
v29-0002-Track-changed-indexed-columns-in-the-executor-du.patch	text/x-patch	113.5 KB
v29-0003-Replace-index_unchanged_by_update-with-ri_Change.patch	text/x-patch	8.4 KB
v29-0004-Identify-if-partial-indexes-are-impacted-by-an-u.patch	text/x-patch	3.8 KB

In response to

Re: Expanding HOT updates for expression and partial indexes at 2026-01-13 14:54:10 from Greg Burd

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tatsuya Kawata	2026-02-10 15:18:26	Re: [PATCH] Add sampling statistics to autoanalyze log output
Previous Message	Daniil Davydov	2026-02-10 15:03:45	Re: POC: Parallel processing of indexes in autovacuum