Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

From: James Coleman <jtc331(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
Date: 2023-09-30 01:27:11
Message-ID: CAAaqYe-AQM2JDorJ-Z5E5Mc1VR6wNZJ1P_SLT2GtkBX=wZw=fQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 29, 2023 at 4:06 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Fri, Sep 29, 2023 at 11:45 AM James Coleman <jtc331(at)gmail(dot)com>
> wrote:my reading the issue is that "old versions" doesn't say
> > anything about "old HOT versions; it seems to be describing what
> > happens generally when a heap-only tuple is written -- which would
> > include the first time a heap-only tuple is written.
>
> I think that it's talking about what happens during opportunistic
> pruning, in particular what happens to HOT chains. (Though pruning
> does almost the same amount of useful work with non-heap-only tuples,
> so it's a bit unfortunate that the name "HOT pruning" seems to have
> stuck.)

That's very likely what the intention was. I read it again, and the
same confusion still sticks out to me: it doesn't say anything
explicitly about opportunistic pruning (I'm not sure if that term is
"public docs" level, so that's probably fine), and it doesn't scope
the claim to intermediate tuples in a HOT chain -- indeed the context
is the HOT feature generally.

This is why I discovered it: it says "indexes do not reference their
page item identifiers", which is manifestly not true when talking
about the root item, and in fact would defeat the whole purpose of HOT
(at least in a old-to-new chain like Postgres uses).

Assuming people can be convinced this is confusing (I realize you may
not be yet), I see two basic options:

1. Update this to discuss both intermediate tuples and root items
separately. This could entail either one larger paragraph or splitting
such that instead of "two optimizations" we say "three" optimizations.

2. Change "old versions" to something like "intermediate versions in a
series of updates".

I prefer some form of (1) since it more fully describes the behavior,
but we could tweak further for concision.

> > And when it's the
> > first heap-only tuple the "old version" would be the original version,
> > which would not be a heap-only tuple.
>
> The docs say "Old versions of updated rows can be completely removed
> during normal operation". Opportunistic pruning removes dead heap-only
> tuples completely, and makes their line pointers LP_UNUSED right away.
> But it can also entail removing storage for the original root item
> heap tuple, and making its line pointer LP_REDIRECT right away (not
> LP_DEAD or LP_UNUSED) at most once in the life of each HOT chain. So
> yeah, we're not quite limited to removing storage for heap-only tuples
> when pruning a HOT chain. Does that distinction really matter, though?

Given pageinspect can show you the original tuple still exists and
that the index still references it...I think it does.

I suppose very few people go checking that out, of course, but I'd
like to be precise.

Regards,
James Coleman

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2023-09-30 02:02:24 Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
Previous Message Peter Geoghegan 2023-09-30 00:49:51 Re: Eager page freeze criteria clarification