Re: documentation on HOT

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Pg Docs <pgsql-docs(at)lists(dot)postgresql(dot)org>
Subject: Re: documentation on HOT
Date: 2022-07-22 16:25:43
Message-ID: CAKFQuwZ_-k6ny-tbV-AZT142vBc2eyK6LtMXxQPSCjXFTM95PQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs

On Fri, Jul 22, 2022 at 8:09 AM Jonathan S. Katz <jkatz(at)postgresql(dot)org>
wrote:

> On 7/22/22 8:51 AM, Bruce Momjian wrote:
> > On Thu, Jul 21, 2022 at 10:02:18PM -0400, Bruce Momjian wrote:
> >> On Sun, Feb 6, 2022 at 09:29:56PM -0500, Jonathan Katz wrote:
> >>> I agree with Bruce's point that we should have a new section (or
> >>> subsection). As I mentioned in my previous post, given HOT involves
> >>> indexing, I would suggest putting it there.
> >>>
> >>> I think that something that follows the general outline of Laurenz's
> post
> >>> would satisfy the user requirements. It explains at a high level what
> HOT
> >>> is, it's advantages, and how it works.
> >>
> >> Attached is a patch that adds a new HOT section to the storage chapter.
> >
> > I wasn't happy with the last paragraph so I added some more details.
> > URL contents updated too:
> >
> > https://momjian.us/tmp/pgsql/storage-hot.html
>
> Thanks! This is great. Probably the most concise and clear explanation I
> have seen for HOT, which is exactly what we need for the docs :)
>
> A few suggestions:
>
> s/Fortunately, there is/To help reduce overhead, PostgreSQL has/
>
> s/In summary, h/H/
>
> > You can increase the change of HOT updates by using non-default table
> fillfactor settings.
>
> I think we should expand on this and explain how adjusting "fillfactor"
> will affect this. I think that may change the final sentence too.
>
>
I think we need to expose the information regarding columns used in
predicates here.

"(Here, "indexed column" means any column referenced
at all in an index definition, including for example columns that are
tested in a partial-index predicate but are not stored in the index.)"

I get it is an implementation detail but explaining the name seems like a
good thing to do as well:

"Without HOT, every version of a row in an update chain has its own index
entries, even if all indexed columns are the same. With HOT, a new tuple
placed on the same page and with all indexed columns the same as its
parent row version does not get new index entries. This means there is
only one index entry for the entire update chain on the heap page.
An index-entry-less tuple is marked with the HEAP_ONLY_TUPLE flag."

Where the last sentence becomes: "Those index-entry-less tuples (yeah,
still dislike triple-hypenation...) are thus named "Heap-Only Tuples".

(I've actually incorporated this as I think it should be down below, as a
lead-in to the listing of conditions for when the optimization can be used.)

Then maybe "can be removed during select" should be reworded as:

"No longer visible heap-only tuples can be removed during normal
operation, including <command>SELECT</command>s, instead of requiring
periodic vacuum operations."

The original heap entry the index points to cannot be removed. "Old
versions of heap-only tuples" vs. "No longer visible heap-only tuples" is
probably a style choice. There are basically three different "versions" in
context here though so avoiding "old versions" has some appeal to me.

I'm not a fan of:

"Fortunately, there is an automatic system..."

I'd like to give credit to the fact we engineered a solution to the
downsides, so change the lead-in paragraph to the conditions listing to be:

"To mitigate these downsides PostgreSQL implements an optimization whereby
sometimes only the heap tuple is created, not the index entry, when
performing an update. In a case of giving things obvious and meaningful
names, this is the Heap-Only Tuple (HOT) Optimization. This optimization
is possible when:"

David J.

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Peter Geoghegan 2022-07-22 17:05:04 Re: documentation on HOT
Previous Message Fujii Masao 2022-07-22 15:35:13 Re: Question about role attributes docs