Re: Naming of the different stats systems / "stats collector"

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Subject: Re: Naming of the different stats systems / "stats collector"
Date: 2022-03-08 22:55:04
Message-ID: CAKFQuwbJHjEfsN4b1jt6FVJnbS-0yp-XAxGyVq8qzhUNMwzkmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 8, 2022 at 1:54 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

>
> One thing I'm not yet happy around the shared memory stats patch is
> naming. Currently a lot of comments say things like:
>
> * [...] We convert to
> * microseconds in PgStat_Counter format when transmitting to the
> collector.
>
> or
>
> # - Query and Index Statistics Collector -
>
> or
>
> /* ----------
> * pgstat_report_subscription_drop() -
> *
> * Tell the collector about dropping the subscription.
> * ----------
> */
>
>
> the immediate question for the patch is what to replace "collector" with.
>
>
Not really following the broader context here so this came out of nowhere
for me. What is the argument for changing the status quo here? Collector
seems like good term.

>
> The patch currently uses "activity statistics" in a number of places, but
> that
> is confusing too, because pg_stat_activity is a different kind of stats.
>
> Any ideas?
>

If the complaint is that not all of these statistics modules use the
statistics collector then maybe we say each non-collector module defines an
"Event Listener". Or, and without looking at the source code, have the
collector simply forward events like "reset now" to the appropriate module
but keep the collector as the single point of message interchange for all.
And so "tell the collector about" is indeed the correct phrasing of what
happens.

>
> The postgresql.conf.sample section header seems particularly odd - "index
> statistics"? We collect more data about tables etc.
>

No argument for bringing the header current.

>
> A more general point: Our naming around different types of stats is
> horribly
> confused. We have stats describing the current state (e.g.
> pg_stat_activity,
> pg_stat_replication, pg_stat_progress_*, ...) and accumulated stats
> (pg_stat_user_tables, pg_stat_database, etc) in the same namespace.
> Should we
> try to move towards something more coherent, at least going forward?
>
>
I'm not sure trying to improve this going forward, and thus having at least
three categories, is particularly desirable. While it is unfortunate that
we don't have separate pg_metric and pg_status namespaces (combining
pg_stat with pg_status or pg_state, the two obvious choices, would be
undesirable being they all have a shared leading character sequence) that
is where we are today. We are probably stuck with just using the pg_stat
namespace and doing a better job of letting users know about the underlying
implementation choice each pg_stat relation took in order to know whether
what is being reported is considered reliable (self-managed shared memory)
or not (leverages the unreliable collector). In short, deal with this
mainly in documentation/comments and implementation details but leave the
public facing naming alone.

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Treat 2022-03-08 23:30:37 Re: Changing "Hot Standby" to "hot standby"
Previous Message Tomas Vondra 2022-03-08 22:44:40 Re: logical decoding and replication of sequences