Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
Date: 2023-09-30 02:02:24
Message-ID: CAH2-WzkFjiayDUkgJ8kafNDzOiSngLwb=yVUJ_JRPsG0RtkUkw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 29, 2023 at 6:27 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
> On Fri, Sep 29, 2023 at 4:06 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > I think that it's talking about what happens during opportunistic
> > pruning, in particular what happens to HOT chains. (Though pruning
> > does almost the same amount of useful work with non-heap-only tuples,
> > so it's a bit unfortunate that the name "HOT pruning" seems to have
> > stuck.)
>
> That's very likely what the intention was. I read it again, and the
> same confusion still sticks out to me: it doesn't say anything
> explicitly about opportunistic pruning (I'm not sure if that term is
> "public docs" level, so that's probably fine), and it doesn't scope
> the claim to intermediate tuples in a HOT chain -- indeed the context
> is the HOT feature generally.

It doesn't mention opportunistic pruning by name, but it does say:

"Old versions of updated rows can be completely removed during normal
operation, including SELECTs, instead of requiring periodic vacuum
operations."

There is a strong association between HOT and pruning (particularly
opportunistic pruning) in the minds of some hackers (and perhaps some
users), because both features appeared together in 8.3, and both are
closely related at the implementation level. It's nevertheless not
quite accurate to say that HOT "provides two optimizations" -- since
pruning (the second of the two bullet points) isn't fundamentally
different for pages that don't have any HOT chains. Not at the level
of the heap pages, at least (indexes are another matter).

Explaining these sorts of distinctions through prose is very
difficult. You really need diagrams for something like this IMV.
Without that, the only way to make all of this less confusing is to
avoid all discussion of pruning...but then you can't really make the
point about breaking the dependency on VACUUM, which is a relatively
important point -- one with real practical relevance.

> This is why I discovered it: it says "indexes do not reference their
> page item identifiers", which is manifestly not true when talking
> about the root item, and in fact would defeat the whole purpose of HOT
> (at least in a old-to-new chain like Postgres uses).

Yeah, but...that's not what was intended. Obviously, the index hasn't
changed, and we expect index scans to continue to give correct
answers. So it is pretty strongly implied that it continues to point
to something valid.

> Assuming people can be convinced this is confusing (I realize you may
> not be yet), I see two basic options:
>
> 1. Update this to discuss both intermediate tuples and root items
> separately. This could entail either one larger paragraph or splitting
> such that instead of "two optimizations" we say "three" optimizations.
>
> 2. Change "old versions" to something like "intermediate versions in a
> series of updates".
>
> I prefer some form of (1) since it more fully describes the behavior,
> but we could tweak further for concision.

Bruce authored these docs. I was mostly just glad to have anything at
all about HOT in the user-facing docs, quite honestly.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2023-09-30 02:34:41 Re: document the need to analyze partitioned tables
Previous Message James Coleman 2023-09-30 01:27:11 Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers