Re: Expanding HOT updates for expression and partial indexes

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Greg Burd <greg(at)burd(dot)me>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Expanding HOT updates for expression and partial indexes
Date: 2025-11-24 18:59:10
Message-ID: CAEze2WjbVc1XfyfGCF_bHV_2V3Gk_bu6P2SX-YygsDqCEnnCEg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 22 Nov 2025, 22:30 Greg Burd, <greg(at)burd(dot)me> wrote:
> Thanks for pointing out the oversight for index-oriented scans (IOS),
> you're right that the code in v22 doesn't handle that correctly. I'll
> fix that. I still think that indexes that don't support IOS can and
> should use the type-specific equality checks. This opens the door to
> HOT with custom types that have unusual equality rules (see BSON).

Do you have specific examples why it would be safe to default to
"unusual equality rules" for generally any index's data ingestion
needs? Why e.g. BSON must always be compared with their special
equality test (and not datumIsEqual), and why IOS-less indexes in
general are never going to distinguish between binary distinct but
btree-equal values, and why exact equality is the special case here?

I understand that you want to maximize optimization for specific
workloads that you have in mind, but lacking evidence to the contrary
I am really not convinced that your workloads are sufficiently
generalizable that they can (and should) be the baseline for these new
HOT rules: I have not yet seen good arguments why we could relax
"datum equality" to "type equality" without potentially breaking
existing indexes.

HOT was implemented quite conservatively to make sure that there are
no issues where changed values are not reflected in indexes: each
indexed TID represents a specific and unchanging set of indexed
values, in both the key and non-key attributes of indexes. If a value
changes however so slightly, that may be a cause for indexes to treat
it differently, and thus HOT must not be used.

Aside: The amsummarizing optimization gets around that check by
realizing the TID itself isn't really indexed, so the rules can be
relaxed around that, but it still needs to go through the effort to
update the summarizing indexes if the relevant attributes were ever so
slightly updated.

This patch right now wants to change these rules and behaviour of HOT
in two ways:

1.) Instead of only testing attributes mentioned by indexed
expressions for changes, it wants to test the output of the indexed
expressions.
I would consider this to be generally safe, as long as the expressions
comply with the rules we have for indexed expressions. [Which, if not
held, would break IOS and various other things, too, so relying on
these rules isn't new or special].

2.) Instead of datumIsEqual, it (by default) wants to do equality
checks as provided by the type's default btree opclass' = operator.
I have not seen evidence that this is safe. I have even explained with
an example that IOS will return distinctly wrong results if only
btree's = operator is used to determine if HOT can be applied, and
that doesn't even begin to cover the issues related to indexes that
may handle data differently from Btree.
I also don't want indexes that at some point in the future invent
support for IOS to return subtly incorrect results due to HOT checks
that depended on the output of a previous version's amcanreturn
output.

So, IMV, tts_attr_equal is just a rather expensive version of
datumIsEqual: the fast path cases would've been handled by
datumIsEqual at least as fast (without a switch() statement with 18
specific cases and a default branch); if there is no btree operator
it'll still default to btree compare, and if not then if the slow path
uses correctly implemented compare operators (for HOT, and potentially
all other possible indexes), then these would have an output that is
indistinguishable from datumIsEqual, with the only difference the
address and performance of the called function and a lot of added
catalog lookups.

All together, I think it's best to remove the second component of the
changes to the HOT rules (changing the type of matching done for
indexed values with tts_attr_compare) from this patchset.
If you believe this should be added regardless, I think it's best
discussed separately in its own thread and patchset -- it should be
relatively easy to introduce in both current and future versions of
this code, and (if you're correct and this is safe) it would have some
benefits even when committed on its own.

Kind regards,

Matthias van de Meent

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2025-11-24 19:03:06 Re: pgsql: Teach DSM registry to ERROR if attaching to an uninitialized ent
Previous Message Nico Williams 2025-11-24 18:54:11 Re: [oauth] SASL mechanisms