Re: Expanding HOT updates for expression and partial indexes

From: "Greg Burd" <greg(at)burd(dot)me>
To: "Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Expanding HOT updates for expression and partial indexes
Date: 2026-02-26 22:08:17
Message-ID: 9bb9bdd6-e1fe-48fe-837d-4d0289396f1c@app.fastmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Wed, Feb 25, 2026, at 4:03 PM, Jeff Davis wrote:
> On Mon, 2026-02-23 at 14:23 -0500, Greg Burd wrote:
>> Hello.
>>
>> Attached is a new patch set that fixes a few issues identified in the
>> last set.
>>
>> 0001 - creates a new way to identify the set of attributes both
>> modified by the update and referenced by one or more indexes on the
>> target relation being updated.  This patch keeps the
>> HeapDetermineColumnsInfo() path within heap_update() for calls from
>> simple_heap_update() when modified_attrs_valid is set to false.  I'm
>> not a huge fan of this, but it does serve as a way to illustrate a
>> minimal set of changes easing review a bit.
>>
>> 0002 - splits out the top portion of heap_update() into both
>> heapam_tuple_update() and simple_heap_update(), adds a few helper
>> functions and tries to reduce repeated code.  The goal here was to
>> remove some of the mess related to the various bitmaps used to make
>> decisions during the update.
>
> IIUC, a minimal version of this patch set might be:
>
> * add 'mix_attrs' bitmap to API for table_tuple_update
> * have executor calculate the bitmap, using the old slot to see if
> expression results have changed
> * have simple_heap_update calculate the bitmap using heap_fetch to get
> the old tuple (would be a redundant pin, but not sure if that's a
> problem or not)
>
> And leave the rest mostly unchanged.
>
> Did I miss something? If not, it would be nice to see such a minimal
> patch and/or understand why we don't follow that approach.

Hey Jeff, thanks for sticking with me on this journey. :)

I think your approach makes sense, here's a summary of what's attached (v30) and at the bottom of this email are some early performance measurements.

* in the executor
* identify the mix_attrs
* one new argument to table_tuple_update( ..., mix_attrs, ...)
* heapam_tuple_update( ..., mix_attrs, ...)
* calculates hot_allowed using mix_attrs
* calculates lockmode using key_attrs and mix_attrs
* two new arguments to heap_update(..., mix_attrs, hot_allowed, ...)
* on return determines what to do with TU_UpdateIndexes
* heap_update( ..., mix_attrs, hot_allowed, ... )
* takes buffer lock
* calculates rep_id_key_req, passes that to ExtractReplicaId()
* if newbuf==buffer && hot_allowed -> HOT
* releases buffer lock

* simple_heap_update( ... no changes to API ... )
* now needs to compare old/new tuples *BEFORE* calling heap_update()
* uses heap_fetch() to turn otid -> oldtuple
* calls HeapUpdateModIdxAttrs()
* calculates lockmode
* calculates if hot is allowed
* calls into heap_update(..., mix_attrs, hot_allowed, ...)
* on return determines what to do with TU_UpdateIndexes
* renamed HeapDetermineColumnsInfo() to HeapUpdateModIdxAttrs()
* removed logic related to rep_id_key_req, that is in heap_update()

> Regards,
> Jeff Davis

There are a pair of functions now for finding "mix_attrs" that replace the singular HeapDetermineColumnsInfo() function:
ExecUpdateModIdxAttrs()
HeapUpdateModIdxAttrs()
These do essentially the same thing, only with different available information and where the latter is called within the context of a buffer lock.

In ExecUpdateModIdxAttrs() we compare two TupleTableSlots, the existing and the plan slot, using a new helper function ExecCompareSlotAttrs(). This gives us the "mix_attrs" (modified indexed attributes) bitmap. In this function we have the ResultRelInfo and EState so it is possible to use the ExecGetAllUpdatedCols() function to potentially reduce the set of attributes we need to check for changes. The function only reviews indexed attributes that also exist in that set, which led to an interesting discovery... see below.

In HeapUpdateModIdxAttrs() we start with an old TID and a HeapTuple, so first we need to fetch that old HeapTuple so we can compare old/new datum and find any modified indexed attributes.

A new function HeapUpdateHotAllowable() is used in heapam_tuple_update() and simple_heap_update() encapsulating that logic in one place including the "only summarized" test. Heap will use the HOT path if that function returns true and the tuple fits on the same buffer page. No logic changes, just moved the decision making around a bit.

A new function HeapUpdateDetermineLockMode() is used to choose exclusive or shared lock mode ahead of calling into heap_update(). Again, same logic as before.

It turns out that ExecGetAllUpdatedCols() doesn't get all updated columns as the name advertises. It finds all the attributes (columns) that were mentioned in the UPDATE statement or any triggers that will fire during the update, but that overlooks any attributes changed within before-row triggers that invoke functions which call heap_modify_tuple(). This happens when tsvector_update_trigger() is called in tsearch.sql, the code modifies an indexed attribute not mentioned in the UPDATE or triggers. I've fixed this oversight and to me this makes sense, but tell me if you disagree.

generated_virtual.sql and updatable_views.sql had tests where the scan order of the tuples on the pages seems to now be non-deterministic. I've updated those tests to ensure stability. AFAICT my changes in this patch should not change any HOT decision or any replica identity key WAL logging decision, but somehow they uncovered this instability. Or there is a bug, but I've not spotted that as yet. Feel free to point out the obvious if you do. :)

Just to be clear, this patch doesn't include any of $subject. In tests I've not measured any performance regressions, and that's not surprising as the sum total computational effort is nearly identical before/after the patch. Yes, the patch moves some computation outside the buffer lock on the heap page and that might open the door to more concurrency or slightly different behavior when updates are highly contentious. There may be more occasions where TU_Updated is returned, or some speed improvements when updating more than one row at a time.

My hope is to get this into a shape where we're comfortable with these changes and it can be committed even though none of $subject is achieved because it does lay some ground work for those future HOT expanding and WARM/PHOT enabling ideas I've been working on.

Things on my TODO list, short term:
* Re-introduce the index AM's new function to allow indexes to play a role in when they require new index entries
* I'm not a fan of TU_UpdateIndexes, it's *very* heap-specific, I'd like to eliminate this

Longer term, so as to return to working on $subject:
* Allow types to indicate that they maintain "sub-attributes" that might be used to form index key datum
* Identify in the executor for each attribute SET if a) it has sub-attributes, and if so b) does the new value for the attribute change any sub-attribute that is used to form index keys
* With the previous two ideas I think we can safely re-introduce HOT for expressions without re-evaluating the expressions and comparing index datum (read: without the overhead I've measured in the past)
* PHOT or WARM or <other nifty name here>, teach heap how to only update changed indexes (rather than the all or nothing approach we have today)

I look forward to community feedback.

best.

-greg

----------------- PERFORMANCE TEST RESULTS

DISCLAIMER: "claude" and I worked on the perf-cf5556-v30.sh script, as I'm sure is apparent. I think people call it "vibe coding" when you try to contain the enthusiasm of your friendly LLM and direct it toward some goal. IME it's like trying to control a room full of dangerously knowledgeable and overly eager to please kindergarten-aged parrots. I admit to needing more time to review the script, the test cases, and the results to fully explore these changes and validate that they actually measure something meaningful. If you find something silly or a glaring mistake, go easy on me (and "claude") but do let me (us?) know.

$ ./perf-cf5556-v30.sh
Checking for running PostgreSQL instances...
✓ No other PostgreSQL instances running

╔════════════════════════════════════════════════════════════════════╗
║ CF-5556 PERFORMANCE TEST SUITE
╚════════════════════════════════════════════════════════════════════╝
Configuration:
Test duration : 60s
Clients / Jobs : 8 / 4
Results directory : /tmp/cf5556-perf-results/20260226_150623
Setup extensions : NO
Test extensions : NO

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
BUILDING AND TESTING
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Baseline: d0833fdae7e (origin/master)
Patches: 1 patch(es) to test cumulatively

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
VERSION: baseline (d0833fdae7e)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Building PostgreSQL...
✓ PostgreSQL built
Starting server...
✓ Server started
shared_preload_libraries: pg_stat_statements
Setting up test databases...
Creating driver_license table (100k rows, 5 BTREE indexes)...
✓ driver_license ready (100000 rows)
Creating t_jsonb table (10k rows, 3 BTREE expression indexes)...
✓ t_jsonb ready (10k rows)
Creating t_gin table (10k rows, GIN index — control)...
✓ t_gin ready (10k rows, GIN — control)

Running isolated tests (60s each)...

license_write_single TPS: 69816.475176 Lat: 0.115ms
jsonb_write_single TPS: 56571.238721 Lat: 0.141ms
jsonb_write_batch TPS: 3606.918699 Lat: 2.218ms
gin_write_single TPS: 64040.989986 Lat: 0.125ms
pgbench_tpcb-like TPS: 20133.238199 Lat: 0.397ms
pgbench_simple-update TPS: 19219.239741 Lat: 0.416ms

Running concurrent read/write tests...

Running concurrent test: 2 writers + 6 readers...
jsonb_2w_6r Write: 14776.018733 TPS Read: 80627.112253 TPS
Write: 0.135 ms Read: 0.074 ms
Running concurrent test: 4 writers + 4 readers...
jsonb_4w_4r Write: 29399.436764 TPS Read: 52688.734439 TPS
Write: 0.136 ms Read: 0.076 ms
Running concurrent test: 6 writers + 2 readers...
jsonb_6w_2r Write: 43151.037340 TPS Read: 25295.378828 TPS
Write: 0.139 ms Read: 0.079 ms
Running concurrent test: 2 writers + 6 readers...
license_2w_6r Write: 18891.968944 TPS Read: 74113.652071 TPS
Write: 0.106 ms Read: 0.081 ms
Running concurrent test: 4 writers + 4 readers...
license_4w_4r Write: 37305.015382 TPS Read: 48489.296785 TPS
Write: 0.107 ms Read: 0.082 ms
Running concurrent test: 6 writers + 2 readers...
license_6w_2r Write: 54403.687584 TPS Read: 23519.461792 TPS
Write: 0.110 ms Read: 0.085 ms

Stopping server...
✓ Server stopped

fatal: a branch named 'cf-5556-test-all-patches' already exists
Applying all 1 patches cumulatively...
Applying v20260226b-0001-Idenfity-modified-indexed-attributes-in-t.patch...

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
VERSION: patched (526c2a8733d)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Building PostgreSQL...
✓ PostgreSQL built
Starting server...
✓ Server started
shared_preload_libraries: pg_stat_statements
Setting up test databases...
Creating driver_license table (100k rows, 5 BTREE indexes)...
✓ driver_license ready (100000 rows)
Creating t_jsonb table (10k rows, 3 BTREE expression indexes)...
✓ t_jsonb ready (10k rows)
Creating t_gin table (10k rows, GIN index — control)...
✓ t_gin ready (10k rows, GIN — control)

Running isolated tests (60s each)...

license_write_single TPS: 70093.895595 Lat: 0.114ms
jsonb_write_single TPS: 56751.907107 Lat: 0.141ms
jsonb_write_batch TPS: 4141.086856 Lat: 1.932ms
gin_write_single TPS: 63845.491951 Lat: 0.125ms
pgbench_tpcb-like TPS: 19911.229480 Lat: 0.402ms
pgbench_simple-update TPS: 19840.566625 Lat: 0.403ms

Running concurrent read/write tests...

Running concurrent test: 2 writers + 6 readers...
jsonb_2w_6r Write: 14821.571968 TPS Read: 81057.390470 TPS
Write: 0.135 ms Read: 0.074 ms
Running concurrent test: 4 writers + 4 readers...
jsonb_4w_4r Write: 29428.063408 TPS Read: 52533.626129 TPS
Write: 0.136 ms Read: 0.076 ms
Running concurrent test: 6 writers + 2 readers...
jsonb_6w_2r Write: 43204.958598 TPS Read: 25301.939523 TPS
Write: 0.139 ms Read: 0.079 ms
Running concurrent test: 2 writers + 6 readers...
license_2w_6r Write: 18958.924353 TPS Read: 74548.095482 TPS
Write: 0.105 ms Read: 0.080 ms
Running concurrent test: 4 writers + 4 readers...
license_4w_4r Write: 37185.369146 TPS Read: 48580.299936 TPS
Write: 0.108 ms Read: 0.082 ms
Running concurrent test: 6 writers + 2 readers...
license_6w_2r Write: 54461.228141 TPS Read: 23692.388873 TPS
Write: 0.110 ms Read: 0.084 ms

Stopping server...
✓ Server stopped

╔════════════════════════════════════════════════════════════════════╗
║ RESULTS SUMMARY
╚════════════════════════════════════════════════════════════════════╝

═══════════════════════════════════════════════════════════════════════════════════
ISOLATED WORKLOAD COMPARISON (Patched vs Baseline)
═══════════════════════════════════════════════════════════════════════════════════
Table Workload Baseline TPS Patched TPS Δ%
───────────────────────────────────────────────────────────────────────────────────
gin write_single 64041.0 63845.5 -0.3%
jsonb write_batch 3606.9 4141.1 +14.8%
jsonb write_single 56571.2 56751.9 +0.3%
license write_single 69816.5 70093.9 +0.4%
pgbench simple-update 19219.2 19840.6 +3.2%
pgbench tpcb-like 20133.2 19911.2 -1.1%
───────────────────────────────────────────────────────────────────────────────────

═══════════════════════════════════════════════════════════════════════════════════
CONCURRENT WORKLOAD ANALYSIS (Write Pressure Impact on Reads)
═══════════════════════════════════════════════════════════════════════════════════
Table Write:Read Base Write Patch Write Base Read Patch Read
───────────────────────────────────────────────────────────────────────────────────
jsonb 2w_6r 14776.0 14821.6 80627.1 81057.4
license 2w_6r 18892.0 18958.9 74113.7 74548.1
jsonb 4w_4r 29399.4 29428.1 52688.7 52533.6
license 4w_4r 37305.0 37185.4 48489.3 48580.3
jsonb 6w_2r 43151.0 43205.0 25295.4 25301.9
license 6w_2r 54403.7 54461.2 23519.5 23692.4
───────────────────────────────────────────────────────────────────────────────────

Output files:
/tmp/cf5556-perf-results/20260226_150623/results.txt (raw results)
/tmp/cf5556-perf-results/20260226_150623/*_server.log (server startup/error logs)
/tmp/cf5556-perf-results/20260226_150623/*_setup.log (database setup logs)
/tmp/cf5556-perf-results/20260226_150623/*_build.log (build logs)
/tmp/cf5556-perf-results/20260226_150623/*_*.txt (pgbench output)
/tmp/cf5556-perf-results/20260226_150623/*_*.sql (test queries)

Cleaning up...
✓ Cleanup complete

Attachment Content-Type Size
v30-0001-Idenfity-modified-indexed-attributes-in-t.patch text/x-patch 57.1 KB
perf-cf5556-v30.sh application/x-shellscript 30.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2026-02-26 22:14:43 Re: Show expression of virtual columns in error messages
Previous Message Jacob Champion 2026-02-26 21:56:56 Re: [oauth] Stabilize the libpq-oauth ABI (and allow alternative implementations?)