Re: shared-memory based stats collector - v68

From: Andres Freund <andres(at)anarazel(dot)de>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Georgios <gkokolatos(at)protonmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: shared-memory based stats collector - v68
Date: 2022-04-04 17:54:15
Message-ID: 20220404175415.6xfeifewwefgxe5t@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-04-05 01:16:04 +1200, Thomas Munro wrote:
> On Mon, Apr 4, 2022 at 4:16 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Please take a look!
>
> A few superficial comments:
>
> > [PATCH v68 01/31] pgstat: consistent function header formatting.
> > [PATCH v68 02/31] pgstat: remove some superflous comments from pgstat.h.
>
> +1

Planning to commit these after making another coffee and proof reading them
some more.

> > [PATCH v68 03/31] dshash: revise sequential scan support.
>
> Logic looks good. That is,
> lock-0-and-ensure_valid_bucket_pointers()-only-once makes sense. Just
> some comment trivia:
>
> + * dshash_seq_term needs to be called when a scan finished. The caller may
> + * delete returned elements midst of a scan by using dshash_delete_current()
> + * if exclusive = true.
>
> s/scan finished/scan is finished/
> s/midst of/during/ (or /in the middle of/, ...)
>
> > [PATCH v68 04/31] dsm: allow use in single user mode.
>
> LGTM.

> + Assert(IsUnderPostmaster || !IsPostmasterEnvironment);

> (Not this patch's fault, but I wish we had a more explicit way to say "am
> single user".)

Agreed.

> > [PATCH v68 05/31] pgstat: stats collector references in comments
>
> LGTM. I could think of some alternative suggested names for this subsystem,
> but don't think it would be helpful at this juncture so I will refrain :-)

Heh. I did start a thread about it a while ago :)

> > [PATCH v68 08/31] pgstat: introduce PgStat_Kind enum.
>
> +#define PGSTAT_KIND_FIRST PGSTAT_KIND_DATABASE
> +#define PGSTAT_KIND_LAST PGSTAT_KIND_WAL
> +#define PGSTAT_NUM_KINDS (PGSTAT_KIND_LAST + 1)
>
> It's a little confusing that PGSTAT_NUM_KINDS isn't really the number of kinds,
> because there is no kind 0. For the two users of it... maybe just use
> pgstat_kind_infos[] = {...}, and
> global_valid[PGSTAT_KIND_LAST + 1]?

Maybe the whole justification for not defining an invalid kind is moot
now. There's not a single switch covering all kinds of stats left, and I hope
that we don't introduce one again...

> > [PATCH v68 10/31] pgstat: scaffolding for transactional stats creation / drop.
>
> + /*
> + * Dropping the statistics for objects that dropped transactionally itself
> + * needs to be transactional. ...
>
> Hard to parse. How about: "Objects are dropped transactionally, so
> related statistics need to be dropped transactionally too."

Not all objects are dropped transactionally. But I agree it reads awkwardly. I
now, incorporating feedback from Justin as well, rephrased it to:

/*
* Statistics for transactionally dropped objects need to be
* transactionally dropped as well. Collect the stats dropped in the
* current (sub-)transaction and only execute the stats drop when we know
* if the transaction commits/aborts. To handle replicas and crashes,
* stats drops are included in commit / abort records.
*/

A few too many "drop"s in there, but maybe that's unavoidable.

> + /*
> + * Whenever the for a dropped stats entry could not be freed (because
> + * backends still have references), this is incremented, causing backends
> + * to run pgstat_gc_entry_refs(), allowing that memory to be reclaimed.
> + */
> + pg_atomic_uint64 gc_count;
>
> Whenever the ...?

* Whenever statistics for dropped objects could not be freed - because
* backends still have references - the dropping backend calls
* pgstat_request_entry_refs_gc() incrementing this counter. Eventually
* that causes backends to run pgstat_gc_entry_refs(), allowing memory to
* be reclaimed.

> Would it be better to call this variable gc_request_count?

Agreed.

> + * Initialize refcount to 1, marking it as valid / not tdroped. The entry
>
> s/tdroped/dropped/
>
> + * further if a longer lived references is needed.
>
> s/references/reference/

Fixed.

> + /*
> + * There are legitimate cases where the old stats entry might not
> + * yet have been dropped by the time it's reused. The easiest case
> + * are replication slot stats. But oid wraparound can lead to
> + * other cases as well. We just reset the stats to their plain
> + * state.
> + */
> + shheader = pgstat_reinit_entry(kind, shhashent);
>
> This whole comment is repeated in pgstat_reinit_entry and its caller.

I guess I felt as indecisive about where to place it between the two locations
when I wrote it as I do now. Left it at the callsite for now.

> + /*
> + * XXX: Might be worth adding some frobbing of the allocation before
> + * freeing, to make it easier to detect use-after-free style bugs.
> + */
> + dsa_free(pgStatLocal.dsa, pdsa);
>
> FWIW dsa_free() clobbers memory in assert builds, just like pfree().

Oh. I could swear I saw that not being the case a while ago. But clearly it is
the case. Removed.

> +static Size
> +pgstat_dsa_init_size(void)
> +{
> + Size sz;
> +
> + /*
> + * The dshash header / initial buckets array needs to fit into "plain"
> + * shared memory, but it's beneficial to not need dsm segments
> + * immediately. A size of 256kB seems works well and is not
> + * disproportional compared to other constant sized shared memory
> + * allocations. NB: To avoid DSMs further, the user can configure
> + * min_dynamic_shared_memory.
> + */
> + sz = 256 * 1024;
>
> It kinda bothers me that the memory reserved by
> min_dynamic_shared_memory might eventually fill up with stats, and not
> be available for temporary use by parallel queries (which can benefit
> more from fast acquire/release on DSMs, and probably also huge pages,
> or maybe not...), and that's hard to diagnose.

It's not great, but I don't really see an alternative? The saving grace is
that it's hard to imagine "real" usages of min_dynamic_shared_memory being
used up by stats.

> + * (4) turn off the idle-in-transaction, idle-session and
> + * idle-state-update timeouts if active. We do this before step (5) so
>
> s/idle-state-/idle-stats-/
>
> + /*
> + * Some of the pending stats may have not been flushed due to lock
> + * contention. If we have such pending stats here, let the caller know
> + * the retry interval.
> + */
> + if (partial_flush)
> + {
>
> I think it's better for a comment that is outside the block to say "If
> some of the pending...". Or the comment should be inside the blocks.

The comment says "if" in the second sentence? But it's a bit awkward anyway,
rephrased to:

* If some of the pending stats could not be flushed due to lock
* contention, let the caller know when to retry.

> +static void
> +pgstat_build_snapshot(void)
> +{
> ...
> + dshash_seq_init(&hstat, pgStatLocal.shared_hash, false);
> + while ((p = dshash_seq_next(&hstat)) != NULL)
> + {
> ...
> + entry->data = MemoryContextAlloc(pgStatLocal.snapshot.context,
> ...
> + }
> + dshash_seq_term(&hstat);
>
> Doesn't allocation failure leave the shared hash table locked?

The shared table itself not - the error path does LWLockReleaseAll(). The
problem is the backend local dshash_table, specifically
find_[exclusively_]locked will stay set, and then cause assertion failures
when used next.

I think we need to fix that in dshash.c. We have code in released branches
that's vulnerable to this problem. E.g.
ensure_record_cache_typmod_slot_exists() in lookup_rowtype_tupdesc_internal().

See also
https://postgr.es/m/20220311012712.botrpsikaufzteyt%40alap3.anarazel.de

Afaics the only real choice is to remove find_[exclusively_]locked and rely on
LWLockHeldByMeInMode() instead.

> > PATCH v68 16/31] pgstat: add pg_stat_exists_stat() for easier testing.
>
> pg_stat_exists_stat() is a weird name, ... would it be better as
> pg_stat_object_exists()?

I was fighting with this one a bunch :). Earlier it was called
pg_stat_stats_exist() I think. "object" makes it sound a bit too much like
it's the database object?

Maybe pg_stat_have_stat()?

> > [PATCH v68 28/31] pgstat: update docs.
>
> + Determines the behaviour when cumulative statistics are accessed
>
> AFAIK our manual is written in en_US, so s/behaviour/behavior/.

Fixed like 10 instances of this in the patchset. Not sure why I just can't
make myself type behavior.

> + memory. When set to <literal>cache</literal>, the first access to
> + statistics for an object caches those statistics until the end of the
> + transaction / until <function>pg_stat_clear_snapshot()</function> is
>
> s|/|unless|
>
> + <literal>none</literal> is most suitable for monitoring solutions. If
>
> I'd change "solutions" to "tools" or maybe "systems".

Done.

> + When using the accumulated statistics views and functions to
> monitor collected data, it is important
>
> Did you intend to write "accumulated" instead of "cumulative" here?

Not sure. I think I got bored of the word at some point :P

> + You can invoke <function>pg_stat_clear_snapshot</function>() to discard the
> + current transaction's statistics snapshot / cache (if any). The next use
>
> I'd change s|/ cache|or cached values|. I think "/" like this is an informal
> thing.

I think we have a few other uses of it. But anyway, changed.

Thanks!

Andres

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2022-04-04 17:56:11 Re: New Object Access Type hooks
Previous Message Tom Lane 2022-04-04 17:51:13 Re: New Object Access Type hooks