Re: shared-memory based stats collector - v70

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: shared-memory based stats collector - v70
Date: 2022-04-08 03:51:10
Message-ID: CAKFQuwbcj8WszSvnR+oqAv9CGtqwVS-OWg6fSAiZH-LxmtWreA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 7, 2022 at 7:10 PM Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
wrote:

> At Thu, 7 Apr 2022 16:37:51 -0700, Andres Freund <andres(at)anarazel(dot)de>
> wrote in
> > Hi,
> >
> > On 2022-04-07 00:28:45 -0700, Andres Freund wrote:
> > > I've gotten through the main commits (and then a fix for the apparently
> > > inevitable bug that's immediately highlighted by the buildfarm), and
> the first
> > > test. I'll call it a night now, and work on the other tests & docs
> tomorrow.
> >
> > I've gotten through the tests now. There's one known, not yet addressed,
> issue
> > with the stats isolation test, see [1].
> >
> >
> > Working on the docs. Found a few things worth raising:
> >
> > 1)
> > Existing text:
> > When the server shuts down cleanly, a permanent copy of the statistics
> > data is stored in the <filename>pg_stat</filename> subdirectory, so
> that
> > statistics can be retained across server restarts. When recovery is
> > performed at server start (e.g., after immediate shutdown, server
> crash,
> > and point-in-time recovery), all statistics counters are reset.
> >
> > The existing docs patch hadn't updated yet. My current edit is
> >
> > When the server shuts down cleanly, a permanent copy of the statistics
> > data is stored in the <filename>pg_stat</filename> subdirectory, so
> that
> > statistics can be retained across server restarts. When crash
> recovery is
> > performed at server start (e.g., after immediate shutdown, server
> crash,
> > and point-in-time recovery, but not when starting a standby that was
> shut
> > down normally), all statistics counters are reset.
> >
> > but I'm not sure the parenthetical is easy enough to understand?
>
> I can read it. But I'm not sure that the difference is obvious for
> average users between "starting a standby from a basebackup" and
> "starting a standby after a normal shutdown"..
>
> Other than that, it might be easier to read if the additional part
> were moved out to the end of the paragraph, prefixing with "Note:
> ". For example,
>
> ...
> statistics can be retained across server restarts. When crash recovery is
> performed at server start (e.g., after immediate shutdown, server crash,
> and point-in-time recovery), all statistics counters are reset. Note that
> crash recovery is not performed when starting a standby that was shut
> down normally then all counters are retained.
>
>
Maybe:
When the server shuts down cleanly a permanent copy of the statistics data
is stored in the <filename>pg_stat</filename> subdirectory so that
statistics can be retained across server restarts. However, if crash
recovery is performed (i.e., after immediate shutdown, server crash, or
point-in-time recovery), all statistics counters are reset. For any
standby server, the initial startup to get the cluster initialized is a
point-in-time crash recovery startup. For all subsequent startups it
behaves like any other server. For a hot standby server, statistics are
retained during a failover promotion.

I'm pretty sure i.e., is correct since those three situations are not
examples but rather the complete set.

Is crash recovery ever performed other than at server start? If so I
choose to remove the redundancy.

I feel like some of this detail about standby servers is/should be covered
elsewhere and we are at least missing a cross-reference chance even if we
leave the material coverage as-is.

> 2)
> > The edit is not a problem, but it's hard to understand what the existing
> > paragraph actually means?
> >
> > diff --git a/doc/src/sgml/high-availability.sgml
> b/doc/src/sgml/high-availability.sgml
> > index 3247e056663..8bfb584b752 100644
> > --- a/doc/src/sgml/high-availability.sgml
> > +++ b/doc/src/sgml/high-availability.sgml
> > @@ -2222,17 +2222,17 @@ HINT: You can then restart the server after
> making the necessary configuration
> > ...
> > <para>
> > - The statistics collector is active during recovery. All scans,
> reads, blocks,
> > + The cumulative statistics system is active during recovery. All
> scans, reads, blocks,
> > index usage, etc., will be recorded normally on the standby.
> Replayed
> > actions will not duplicate their effects on primary, so replaying an
> > insert will not increment the Inserts column of pg_stat_user_tables.
> > The stats file is deleted at the start of recovery, so stats from
> primary
> > and standby will differ; this is considered a feature, not a bug.
> > </para>
> >
> > <para>
>
> Agreed partially. It's too detailed. It might not need to mention WAL
> replay.
>
>
The insert example seems like a poor one...IIUC cumulative statistics are
not WAL logged and while in recovery INSERT is prohibited, so how would
replaying the insert in the WAL result in a duplicated effect on
pg_stat_user_tables.inserts?

I also have no idea what, in the fragment, "Replayed actions will not
duplicate their effects on primary...", what "on primary" is supposed to
mean.

I would like to write the following but I don't believe it is sufficiently
true:

"The cumulative statistics system records only the locally generated
activity of the cluster, including while in recovery. Activity happening
only due to the replay of WAL is not considered local."

But to apply the WAL we have to fetch blocks from the local filesystem and
write them back out. That is local activity happening due to the replay of
WAL which sounds like it is and should be reported ("All...blocks...", and
the example given being logical DDL oriented).

I cannot think of a better paragraph at the moment, the minimal change is
good, and the detail it contains presently seems like the right amount, if
indeed my interpretation of it is correct (i.e., the standby records
physical stats, not logical ones). It still has wording issues around "on
primary" and maybe a better example choice than a
disallowed-in-recovery-anyway insert.

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-04-08 03:59:21 Re: shared-memory based stats collector - v70
Previous Message Greg Stark 2022-04-08 03:39:47 Re: pg14 psql broke \d datname.nspname.relname