Re: shared-memory based stats collector - v68

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, gkokolatos(at)protonmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: shared-memory based stats collector - v68
Date: 2022-04-05 02:03:13
Message-ID: CAKFQuwY3SHj6Dc64zgSMWccvWipG0H2GLXP3cNXvSbk0UE0mBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 4, 2022 at 3:44 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2022-04-04 15:24:24 -0700, David G. Johnston wrote:
> > Replacing the existing assert(!kind->fixed_amount) with
> > assert(!kind->accessed_across_databases) produces the same result as the
> > later presently implies the former.
>
> I wasn't proposing to replace, but to add...
>

Right, but it seems redundant to have both when one implies the other. But
I'm not hard set against it either, though my idea below make them both
obsolete.

>
> > Now I start to dislike the behavioral aspect of the attribute and would
> > rather just name it: kind->is_cluster_scoped (or something else that is
> > descriptive of the stat category itself, not how it is used)
>
> I'm not in love with the name either. But cluster is just a badly
> overloaded
> word :(.
>
> system_wide? Or invert it and say: database_scoped? I think I like the
> latter.
>
>
I like database_scoped as well...but see my idea below that makes this
obsolete.

>
> > Then reorganize the Kind documentation to note and emphasize these two
> > primary descriptors:
> > variable, which can be cluster or database scoped
> > fixed, which are cluster scoped by definition
>
> Hm. There's not actually that much difference between cluster/non-cluster
> wide
> scope for most of the system. I'm not strongly against, but I'm also not
> really seeing the benefit.
>

Not married to it myself, something to come back to when the dust settles.

> > (if this is true...but given this is an optimization category I'm
> thinking
> > maybe it doesn't actually matter...)
>
> It is true. Not sure what you mean with "optimization category"?
>
>
I mean that distinguishing between stats that are fixed and those that are
variable implies that fixed kinds have a better performance (speed, memory)
characteristic than variable kinds (at least in part due to the presence of
changecount). If fixed kinds did not have a performance benefit then
having the variable kind implementation simply handle fixed kinds as well
(using the common struct header and storage in a hash table) would make the
implementation simpler since all statistics would report through the same
API. In that world, variability is simply a possibility that not every
actual reporter has to use. That improved performance characteristic is
what I meant by "optimization category". I question whether we should be
publishing "fixed" and "variable" as concrete properties. I'm not
presently against the current choice to do so, but as you say above, I'm
also not really seeing the benefit.

(goes and looks at all the places that use the fixed_amount
field...sparking an idea)

Coming back to this:
"""
+ /* cluster-scoped object stats having a variable number of entries */
+ PGSTAT_KIND_REPLSLOT = 1, /* per-slot statistics */
+ PGSTAT_KIND_SUBSCRIPTION, /* per-subscription statistics */
+ PGSTAT_KIND_DATABASE, /* database-wide statistics */ (I moved this to 3rd
spot to be closer to the database-scoped options)
+
+ /* database-scoped object stats having a variable number of entries */
+ PGSTAT_KIND_RELATION, /* per-table statistics */
+ PGSTAT_KIND_FUNCTION, /* per-function statistics */
+
+ /* cluster-scoped stats having a fixed number of entries */ (maybe these
should go first, the variable following?)
+ PGSTAT_KIND_ARCHIVER,
+ PGSTAT_KIND_BGWRITER,
+ PGSTAT_KIND_CHECKPOINTER,
+ PGSTAT_KIND_SLRU,
+ PGSTAT_KIND_WAL,
"""

I see three "KIND_GROUP" categories here:
PGSTAT_KIND_CLUSTER (open to a different word here though...)
PGSTAT_KIND_DATABASE (we seem to agree on this above)
PGSTAT_KIND_GLOBAL (already used in the code)

This single enum can replace the two booleans that, in combination, would
define 4 unique groups (of which only three are interesting -
database+fixed doesn't seem interesting and so is not given a name/value
here).

While the succinctness of the booleans has appeal the need for half of the
booleans to end up being negated quickly tarnishes it. With the three
groups, every assertion is positive in nature indicating which of the three
groups are handled by the function. While that is probably a few more
characters it seems like an easier read and is less complicated as it has
fewer independent parts. At most you OR two kinds together which is
succinct enough I would think. There are no gaps relative to the existing
implementation that defines fixed_amount and accessed_across_databases -
every call site using either of them can be transformed mechanically.

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-04-05 02:10:30 Re: PATCH: add "--config-file=" option to pg_rewind
Previous Message vignesh C 2022-04-05 01:54:37 Re: Handle infinite recursion in logical replication setup