Re: Naming of the different stats systems / "stats collector"

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: david(dot)g(dot)johnston(at)gmail(dot)com
Cc: andres(at)anarazel(dot)de, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Naming of the different stats systems / "stats collector"
Date: 2022-03-09 01:34:58
Message-ID: 20220309.103458.2187559138877561894.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Tue, 8 Mar 2022 15:55:04 -0700, "David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> wrote in
> On Tue, Mar 8, 2022 at 1:54 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > the immediate question for the patch is what to replace "collector" with.
> >
> >
> Not really following the broader context here so this came out of nowhere
> for me. What is the argument for changing the status quo here? Collector
> seems like good term.

The name "stats collector" is tied with the story that "there is a
process that only collects stats data that arrive from working
proceses.". We have such modules like bgwriter, checkpointer,
walwriter and so on. On the other hand we have many features with no
dedicate process instead work on shared storage area as a part of
working prcesses. table/column statistics, XLOG, heap, SLUR and so on.

In the world where every working process writes statitics to shared
meomry area by its own, no such process exists. I think we no longer
name it "stats collector".

> > The patch currently uses "activity statistics" in a number of places, but
> > that
> > is confusing too, because pg_stat_activity is a different kind of stats.
> >
> > Any ideas?
> >
>
> If the complaint is that not all of these statistics modules use the
> statistics collector then maybe we say each non-collector module defines an
> "Event Listener". Or, and without looking at the source code, have the
> collector simply forward events like "reset now" to the appropriate module
> but keep the collector as the single point of message interchange for all.
> And so "tell the collector about" is indeed the correct phrasing of what
> happens.

So the collector as a process is going to die. We need alternative
name for the non-collector. Metrics, as you mentioned below, sounds
good to me. The name "activity stat(istics)?s" is an answer to my
desire to discriminate it from "table/column statistics" but I have to
admit that it is still not great.

> > The postgresql.conf.sample section header seems particularly odd - "index
> > statistics"? We collect more data about tables etc.
> >
>
> No argument for bringing the header current.
>
> >
> > A more general point: Our naming around different types of stats is
> > horribly
> > confused. We have stats describing the current state (e.g.
> > pg_stat_activity,
> > pg_stat_replication, pg_stat_progress_*, ...) and accumulated stats
> > (pg_stat_user_tables, pg_stat_database, etc) in the same namespace.
> > Should we
> > try to move towards something more coherent, at least going forward?
> >
> >
> I'm not sure trying to improve this going forward, and thus having at least
> three categories, is particularly desirable. While it is unfortunate that
> we don't have separate pg_metric and pg_status namespaces (combining
> pg_stat with pg_status or pg_state, the two obvious choices, would be
> undesirable being they all have a shared leading character sequence) that
> is where we are today. We are probably stuck with just using the pg_stat
> namespace and doing a better job of letting users know about the underlying
> implementation choice each pg_stat relation took in order to know whether
> what is being reported is considered reliable (self-managed shared memory)
> or not (leverages the unreliable collector). In short, deal with this
> mainly in documentation/comments and implementation details but leave the
> public facing naming alone.
>
> David J.

If we could, I like the namings like pg_metrics.process,
pg_metrics.replication, pg_progress.vacuum, pg_progress.basebackup,
and pg_stats.database, pg_stats.user_tables.. With such eyes, it
looks somewhat odd that pg_stat_* views are belonging to the
pg_catalog namespace.

If we had system table-aliases, people who insist on the good-old
names can live with that. Even if there isn't, we can instead provide
views with the old names.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-03-09 01:50:14 Re: Naming of the different stats systems / "stats collector"
Previous Message Chapman Flack 2022-03-09 01:24:13 Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file