Re: Expanding HOT updates for expression and partial indexes

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Greg Burd <greg(at)burd(dot)me>
Cc: "Burd, Greg" <gregburd(at)amazon(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Expanding HOT updates for expression and partial indexes
Date: 2025-10-08 20:48:46
Message-ID: aObOLg3JsebbwjOU@nathan
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 07, 2025 at 05:36:11PM -0400, Greg Burd wrote:
> I put the patch aside for a while, then this past week at PGConf.dev/NYC
> I heard interest from a few people (Jeff Davis, Nathan Bossart) who
> encouraged me to move the code executing the expressions to just before
> acquiring the lock but after pinning the buffer. The theory being that
> my new code using the old/new tts to form and test the index tuples
> resulting from executing expressions was using the resultsRelInfo struct
> created during plan execution, not the information found on the page,
> and so was safe without the lock.

An open question (at least from me) is whether this is safe. I'm not
familiar enough with this area of code yet to confidently determine that.

> After reviewing how updates work in the executor, I discovered that
> during execution the new tuple slot is populated with the information
> from ExecBuildUpdateProjection() and the old tuple, but that most
> importantly for this use case that function created a bitmap of the
> modified columns (the columns specified in the update). This bitmap
> isn't the same as the one produced by HeapDetermineColumnsInfo() as the
> latter excludes attributes that are not changed after testing equality
> with the helper function heap_attr_equals() where as the former will
> include attributes that appear in the update but are the same value as
> before. This, happily, is immaterial for the purposes of my function
> ExecExprIndexesRequireUpdates() which simply needs to check to see if
> index tuples generated are unchanged. So I had all I needed to run the
> checks ahead of acquiring the lock on the buffer.

Nice.

> There is much room for improvement, and your suggestions are welcome.

A general and predictable suggestion is to find ways to break this into
smaller pieces. As-is, this patch would take me an enormous amount of time
to review in any depth. If we can break off some smaller pieces that we
can scrutinize and commit independently, we can start making forward
progress sooner. The UpdateContext and reloption stuff are examples of
things that might be possible to split into independent patches.

> I'll find time to quantify the benefit of this patch for the targeted
> use cases and to ensure that all other cases see no regressions.

Looking forward to these results. This should also help us decide whether
to set expression_checks by default.

> I added a reloption "expression_checks" to disable this new code path.
> Good idea or bad precedent?

If there are cases where the added overhead outweighs the benefits (which
seems like it must be true some of the time), then I think we must have a
way to opt-out (or maybe even opt-in). In fact, I'd advise adding a GUC to
complement the reloption so that users can configure it at higher levels.

> In execIndexing I special case for IsolationIsSerializable() and I can't
> remember why now but I do recall one isolation test failing... I'll
> check on this and get back to the thread. Or maybe you know why that

I didn't follow this.

> I'd like not to build, then rebuild index tuples for these expressions
> but I can't think of a way to do that without a palloc(), this is
> avoided today.

Is the avoidance of palloc() a strict rule? Is this discussed in the code
anywhere?

--
nathan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2025-10-08 20:52:17 Re: Should we update the random_page_cost default value?
Previous Message Tomas Vondra 2025-10-08 20:46:34 Re: Fix overflow of nbatch