From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Greg Burd <greg(at)burd(dot)me> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Expanding HOT updates for expression and partial indexes |
Date: | 2025-10-14 18:43:09 |
Message-ID: | f59348afe291ef62b8596fc1615104c51c620bec.camel@j-davis.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 2025-10-14 at 13:46 -0400, Greg Burd wrote:
> I'm trying to knit this into the executor layer but that is tricky
> because
> the concept of HOT is very heap-specific, so the executor should be
> ignorant of the heap's specific needs (right?).
It's wrong for the executor to say "do a HOT update" but it's OK for
the executor to say "this is the set of indexes that might have a new
key after the update". If that set is empty, then the heap can choose
to do a HOT update.
> Right now, I am considering
> adding a step in ExecUpdatePrologue() just after opening the indexes.
Seems like a reasonable place.
> The idea I'm toying with is to have a new function on all
> TupleTableSlots...
>
> That way for heap we'd have something like:
> Bitmapset *tts_heap_getidxattr(ResultRelInfo *info,
> TupleTableSlot *updated,
> TupleTableSlot *existing,
> Bitmapset *updated_attrs)
> {
> some combo of HeapDeterminColumnsInfo() and
> ExecExprIndexesRequireUpdates()
>
> returns the set of indexed attrs that this update changed
> }
Why is this a generic method for all slots? Do we need to reuse it
somewhere else? I would have expected just a static method in
nodeModifyTable.c that does just what's needed.
And to be precise, it's the set of indexed attrs where the update might
have created a new key, right? The whole point is that we don't care if
the indexed attr has been changed, so long as it doesn't create a new
index key.
> Interestingly, summarizing indexes that don't overlap with changed
> attributes won't be updated (and that's a good thing).
Nice.
> Problem is we're not yet accounting for what is about to happen in
> ExecUpdateAct() when calling into the heap_update(). That's where
> heap tries to fit the new tuple onto the same page. That might be
> possible with large tuples thanks to TOAST, it's impossible to say
> before getting into this function with the page locked.
I don't see why that's a problem. The executor can pass down the list
of indexed attrs that might have created new keys after the update,
then heap_update uses that information (along with other factors, like
if it fits on the same page) to determine whether to perform a HOT
update or not.
> So, for updates we include the modified_attrs in the UpdateContext
> which is available to heap_update().
It doesn't look like UpdateContext is currently available to
heap_update(). We might need to change the signature. But I think it's
fine to change the signature if it results in a cleaner design --
tableam extensions often need source changes when new major versions
are released.
> If the heap code decides to
> go HOT, great unset all attributes in the modified_attrs except any
> that are only summarizing. If the heap can't go HOT, fine, add
> the indexed attrs back into modified_attrs which should trigger all
> indexes to be updated.
IIUC, that sounds like a good plan.
> This gets rid of TU_UpdateIndexes enum and allows only modified
> summarizing indexes to be updated on the HOT path. Two additional
> benefits IMO.
I'm not sure that I understand, but I'll look at that after we sort out
some of the other details.
Regards,
Jeff Davis
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2025-10-14 18:49:06 | Re: pg_createsubscriber - more logging to say if there are no pubs to drop |
Previous Message | Masahiko Sawada | 2025-10-14 18:09:43 | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |