Re: Expanding HOT updates for expression and partial indexes

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Greg Burd <greg(at)burd(dot)me>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Expanding HOT updates for expression and partial indexes
Date: 2025-10-14 18:43:09
Message-ID: f59348afe291ef62b8596fc1615104c51c620bec.camel@j-davis.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2025-10-14 at 13:46 -0400, Greg Burd wrote:
> I'm trying to knit this into the executor layer but that is tricky
> because
> the concept of HOT is very heap-specific, so the executor should be
> ignorant of the heap's specific needs (right?).

It's wrong for the executor to say "do a HOT update" but it's OK for
the executor to say "this is the set of indexes that might have a new
key after the update". If that set is empty, then the heap can choose
to do a HOT update.

> Right now, I am considering
> adding a step in ExecUpdatePrologue() just after opening the indexes.

Seems like a reasonable place.

> The idea I'm toying with is to have a new function on all
> TupleTableSlots...
>
> That way for heap we'd have something like:
> Bitmapset *tts_heap_getidxattr(ResultRelInfo *info,
> TupleTableSlot *updated,
> TupleTableSlot *existing,
> Bitmapset *updated_attrs)
> {
> some combo of HeapDeterminColumnsInfo() and
> ExecExprIndexesRequireUpdates()
>
> returns the set of indexed attrs that this update changed
> }

Why is this a generic method for all slots? Do we need to reuse it
somewhere else? I would have expected just a static method in
nodeModifyTable.c that does just what's needed.

And to be precise, it's the set of indexed attrs where the update might
have created a new key, right? The whole point is that we don't care if
the indexed attr has been changed, so long as it doesn't create a new
index key.

> Interestingly, summarizing indexes that don't overlap with changed
> attributes won't be updated (and that's a good thing).

Nice.

> Problem is we're not yet accounting for what is about to happen in
> ExecUpdateAct() when calling into the heap_update().  That's where
> heap tries to fit the new tuple onto the same page.  That might be
> possible with large tuples thanks to TOAST, it's impossible to say
> before getting into this function with the page locked.

I don't see why that's a problem. The executor can pass down the list
of indexed attrs that might have created new keys after the update,
then heap_update uses that information (along with other factors, like
if it fits on the same page) to determine whether to perform a HOT
update or not.

> So, for updates we include the modified_attrs in the UpdateContext
> which is available to heap_update().

It doesn't look like UpdateContext is currently available to
heap_update(). We might need to change the signature. But I think it's
fine to change the signature if it results in a cleaner design --
tableam extensions often need source changes when new major versions
are released.

>   If the heap code decides to
> go HOT, great unset all attributes in the modified_attrs except any
> that are only summarizing.  If the heap can't go HOT, fine, add
> the indexed attrs back into modified_attrs which should trigger all
> indexes to be updated.

IIUC, that sounds like a good plan.

> This gets rid of TU_UpdateIndexes enum and allows only modified
> summarizing indexes to be updated on the HOT path.  Two additional
> benefits IMO.

I'm not sure that I understand, but I'll look at that after we sort out
some of the other details.

Regards,
Jeff Davis

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2025-10-14 18:49:06 Re: pg_createsubscriber - more logging to say if there are no pubs to drop
Previous Message Masahiko Sawada 2025-10-14 18:09:43 Re: POC: enable logical decoding when wal_level = 'replica' without a server restart