Re: Additional Chapter for Tutorial

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Jürgen Purtz <juergen(at)purtz(dot)de>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Subject: Re: Additional Chapter for Tutorial
Date: 2020-11-10 21:58:26
Message-ID: CAKFQuwbM_CDN5Mthjo0kmWReCBRhHTni-LRsgB3L3W4B6imiiQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Sun, Nov 8, 2020 at 8:56 AM Jürgen Purtz <juergen(at)purtz(dot)de> wrote:

> Good catches. Everything applied.
>

MVCC Section

The first paragraph and example in the MVCC section is a good example but
seems misplaced - its relationship to MVCC generally is tenuous, rather I
would expect a discussion of the serializable isolation mode to follow.

I'm not sure how much detail this section wants to get into given the
coverage of concurrency elsewhere in the documentation. "Not much" would
be my baseline.

I would suggest spelling out what "OLTP" stands for and ideally pointing
the user to the glossary for the term.

Tending more toward a style gripe but the amount of leader phrases and
redundancy are at a level that I am noticing them when I read this but do
not have the same impression having read large portions of documentation.
In particular:

"When we speak about transaction IDs, you need to know that xids are like
sequences."

"But keep in mind that xids are independent of any time measurement — in
milliseconds or otherwise. If you dive deeper into PostgreSQL, you will
recognize parameters with names such as 'xxx_age'. Despite their names,
these '_age' parameters do not specify a period of time but represent a
certain number of transactions, e.g., 100 million."

Could just be: xids are sequences and age computations involving them
measure a transaction count as opposed to a time interval.

Then I would consider adding a bit more detail/context here.

xids are 32bit sequences, with a reserved value to handle wrap-around.
There are 4 billion values in the sequence but wrap-around handling must
occur every 2 billion transactions. Age computations involving xids measure
a transaction count as opposed to a time interval.

I would move the mentioning of "vacuum" to the main paragraph about delete
and not solely as a "keep in mind" note.

The part before the diagram seems like it should be much shorter, concise,
and provide links to the excellent documentation. The part after the
image, and the image itself, are good material, though possibly should be
in a main administration chapter instead of an internals chapter.

The first bullet of "keep in mind" is both wordy and wrong - in particular
"as xids grow old row versions get out of scope over time" doesn't make
sense (or rather it only does in the context of wrap-around, not normal
visibility). Having the only mention of bloat be here is also not ideal,
it too should be weaved into the main narrative. The "keep in mind"
section here should be a recap of already covered material in a succinct
form, nothing should be new to someone who just read the entire section.

I don't think that usage of exclamation marks (!) is warranted here, though
emphasis on the key phrase wouldn't hurt.

Vacuum Section

avoid -> prevent (continued growth)

Autovacuum is enabled by default. The whole note needs commas.

I'd try to get rid of "at arbitrary point in time"

"Instance." we've already described where instances are previously ("on the
server")

The other sections - these seem misplaced for the tutorial, update the main
documentation if this information is wholly missing or lacking. The MVCC
chapter can incorporate overview information as it is a strict consequence
of that implementation.

Statistics belong elsewhere - the tutorial should not use poor command
implementation choices as a guide for user education.

In short, this whole section should not exist and its content moved to more
appropriate areas (mainly MVCC). Vacuum is a tool that one must use but
the narrative should be about the system generally.

David J.

In response to

Browse pgsql-docs by date

  From Date Subject
Next Message PG Doc comments form 2020-11-10 23:05:52 One more example of generating time series
Previous Message Bruce Momjian 2020-11-10 16:08:01 Re: What does "[backends] should seldom or never need to wait for a write to occur" mean?

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2020-11-10 22:01:57 Re: Reduce the number of special cases to build contrib modules on windows
Previous Message Russell Foster 2020-11-10 21:22:37 Re: Windows regress fails (latest HEAD)