Re: shared-memory based stats collector - v69

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: shared-memory based stats collector - v69
Date: 2022-04-05 21:43:49
Message-ID: CAKFQuwa8RZ13Sw=GMWgNysccZs3efm1t5zU-ONcGU45OkemRcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 5, 2022 at 2:23 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

>
> On 2022-04-05 13:51:12 -0700, David G. Johnston wrote:
>
> >, but rather add to the shared queue
>
> Queue? Maybe you mean the hashtable?
>

Queue implemented by a list...? Anyway, I think I mean this:

/*
* List of PgStat_EntryRefs with unflushed pending stats.
*
* Newly pending entries should only ever be added to the end of the list,
* otherwise pgstat_flush_pending_entries() might not see them immediately.
*/
static dlist_head pgStatPending = DLIST_STATIC_INIT(pgStatPending);

>
>
> >, and so the cache is effectively read-only. It is also
> transaction-scoped
> >based upon the GUC and the nature of stats vis-a-vis transactions.
>
> No, that's not right. I think you might be thinking of
> pgStatLocal.snapshot.stats?
>
>
Probably...

> I guess I should add a paragraph about snapshots / fetch consistency.
>

I apparently confused/combined the two concepts just now so that would help.

>
> > Even before I added the read-only and transaction-scoped I got a bit hung
> > up on reading:
> > "The shared hashtable only needs to be accessed when no prior reference
> to
> > the shared hashtable exists."
>
> > Thinking in terms of key seems to make more sense than value in this
> > sentence - even if there is a one-to-one correspondence.
>
> Maybe "prior reference to the shared hashtable exists for the key"?
>

I specifically dislike having two mentions of the "shared hashtable" in the
same sentence, so I tried to phrase the second half in terms of the local
hashtable.

> > I am wondering why there are no mentions to the header files in this
> > architecture, only the .c files.
>
> Hm, I guess, but I'm not sure it'd add a lot? It's really just intended to
> give a starting point (and it can't be worse than explanation of the
> current
> system).
>

No need to try to come up with something. More curious if there was a
general reason to avoid it before I looked to see if I felt anything in
them seemed worth including from my perspective.

>
>
> > diff --git a/src/backend/utils/activity/pgstat.c
> b/src/backend/utils/activity/pgstat.c
> > index bfbfe53deb..504f952c0e 100644
> > --- a/src/backend/utils/activity/pgstat.c
> > +++ b/src/backend/utils/activity/pgstat.c
> > @@ -4,9 +4,9 @@
> > *
> > *
> > * PgStat_KindInfo describes the different types of statistics handled.
> Some
> > - * kinds of statistics are collected for fixed number of objects
> > - * (e.g. checkpointer statistics). Other kinds are statistics are
> collected
> > - * for variable-numbered objects (e.g. relations).
> > + * kinds of statistics are collected for a fixed number of objects
> > + * (e.g., checkpointer statistics). Other kinds of statistics are
> collected
>
> Was that comma after e.g. intentional?
>

It is. That is the style I was taught, and that we seem to adhere to in
user-facing documentation. Source code is a mixed bag with no enforcement,
but while we are here...

> > + * for a varying number of objects (e.g., relations).
> > * Fixed-numbered stats are stored in plain (non-dynamic) shared memory.
>

status-quo works for me too, and matches up with the desired labelling we
are using here.

> > *
> > @@ -19,19 +19,21 @@
> > *
> > * All variable-numbered stats are addressed by PgStat_HashKey while
> running.
> > * It is not possible to have statistics for an object that cannot be
> > - * addressed that way at runtime. A wider identifier can be used when
> > + * addressed that way at runtime. A alternate identifier can be used
> when
> > * serializing to disk (used for replication slot stats).
>
> Not sure this improves things.
>
>
It just seems odd that width is being mentioned when the actual struct is a
combination of three subcomponents. I do feel I'd need to understand
exactly what replication slot stats are doing uniquely here, though, to
make any point beyond that.

>
> > - * Each statistics kind is handled in a dedicated file:
> > + * Each statistics kind is handled in a dedicated file, though their
> structs
> > + * are defined here for lack of better ideas.
>
> -0.5
>
>
Status-quo works for me. Food for thought for other reviewers though.

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-04-05 21:50:02 Re: Granting SET and ALTER SYSTE privileges for GUCs
Previous Message Andres Freund 2022-04-05 21:23:38 Re: shared-memory based stats collector - v69