Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Lukas Fittl <lukas(at)fittl(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Date: 2021-10-08 17:56:20
Message-ID: 20211008175620.nspjy6wwmzvyzkrg@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-10-01 16:05:31 -0400, Melanie Plageman wrote:
> From 40c809ad1127322f3462e85be080c10534485f0d Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Date: Fri, 24 Sep 2021 17:39:12 -0400
> Subject: [PATCH v13 1/4] Allow bootstrap process to beinit
>
> ---
> src/backend/utils/init/postinit.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
> index 78bc64671e..fba5864172 100644
> --- a/src/backend/utils/init/postinit.c
> +++ b/src/backend/utils/init/postinit.c
> @@ -670,8 +670,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
> EnablePortalManager();
>
> /* Initialize status reporting */
> - if (!bootstrap)
> - pgstat_beinit();
> + pgstat_beinit();
>
> /*
> * Load relcache entries for the shared system catalogs. This must create
> --
> 2.27.0
>

I think it's good to remove more and more of these !bootstrap cases - they
really make it harder to understand the state of the system at various
points. Optimizing for the rarely executed bootstrap mode at the cost of
checks in very common codepaths...

> From a709ddb30b2b747beb214f0b13cd1e1816094e6b Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Date: Thu, 30 Sep 2021 16:16:22 -0400
> Subject: [PATCH v13 2/4] Add utility to make tuplestores for pg stat views
>
> Most of the steps to make a tuplestore for those pg_stat views requiring
> one are the same. Consolidate them into a single helper function for
> clarity and to avoid bugs.
> ---
> src/backend/utils/adt/pgstatfuncs.c | 129 ++++++++++------------------
> 1 file changed, 44 insertions(+), 85 deletions(-)
>
> diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
> index ff5aedc99c..513f5aecf6 100644
> --- a/src/backend/utils/adt/pgstatfuncs.c
> +++ b/src/backend/utils/adt/pgstatfuncs.c
> @@ -36,6 +36,42 @@
>
> #define HAS_PGSTAT_PERMISSIONS(role) (is_member_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS) || has_privs_of_role(GetUserId(), role))
>
> +/*
> + * Helper function for views with multiple rows constructed from a tuplestore
> + */
> +static Tuplestorestate *
> +pg_stat_make_tuplestore(FunctionCallInfo fcinfo, TupleDesc *tupdesc)
> +{
> + Tuplestorestate *tupstore;
> + ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
> + MemoryContext per_query_ctx;
> + MemoryContext oldcontext;
> +
> + /* check to see if caller supports us returning a tuplestore */
> + if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("set-valued function called in context that cannot accept a set")));
> + if (!(rsinfo->allowedModes & SFRM_Materialize))
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("materialize mode required, but it is not allowed in this context")));
> +
> + /* Build a tuple descriptor for our result type */
> + if (get_call_result_type(fcinfo, NULL, tupdesc) != TYPEFUNC_COMPOSITE)
> + elog(ERROR, "return type must be a row type");
> +
> + per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
> + oldcontext = MemoryContextSwitchTo(per_query_ctx);
> +
> + tupstore = tuplestore_begin_heap(true, false, work_mem);
> + rsinfo->returnMode = SFRM_Materialize;
> + rsinfo->setResult = tupstore;
> + rsinfo->setDesc = *tupdesc;
> + MemoryContextSwitchTo(oldcontext);
> + return tupstore;
> +}

Is pgstattuple the best place for this helper? It's not really pgstatfuncs
specific...

It also looks vaguely familiar - I wonder if we have a helper roughly like
this somewhere else already...

> From e9a5d2a021d429fdbb2daa58ab9d75a069f334d4 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Date: Wed, 29 Sep 2021 15:39:45 -0400
> Subject: [PATCH v13 3/4] Add system view tracking IO ops per backend type
>

> diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
> index be7366379d..0d18e7f71a 100644
> --- a/src/backend/postmaster/checkpointer.c
> +++ b/src/backend/postmaster/checkpointer.c
> @@ -1104,6 +1104,7 @@ ForwardSyncRequest(const FileTag *ftag, SyncRequestType type)
> */
> if (!AmBackgroundWriterProcess())
> CheckpointerShmem->num_backend_fsync++;
> + pgstat_inc_ioop(IOOP_FSYNC, IOPATH_SHARED);
> LWLockRelease(CheckpointerCommLock);
> return false;
> }

ISTM this doens't need to happen while holding CheckpointerCommLock?

> @@ -1461,7 +1467,25 @@ pgstat_reset_shared_counters(const char *target)
> errhint("Target must be \"archiver\", \"bgwriter\", or \"wal\".")));
>
> pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETSHAREDCOUNTER);
> - pgstat_send(&msg, sizeof(msg));
> +
> + if (msg.m_resettarget == RESET_BUFFERS)
> + {
> + int backend_type;
> + PgStatIOPathOps ops[BACKEND_NUM_TYPES];
> +
> + memset(ops, 0, sizeof(ops));
> + pgstat_report_live_backend_io_path_ops(ops);
> +
> + for (backend_type = 1; backend_type < BACKEND_NUM_TYPES; backend_type++)
> + {
> + msg.m_backend_resets.backend_type = backend_type;
> + memcpy(&msg.m_backend_resets.iop, &ops[backend_type], sizeof(msg.m_backend_resets.iop));
> + pgstat_send(&msg, sizeof(msg));
> + }
> + }
> + else
> + pgstat_send(&msg, sizeof(msg));
> +
> }

I'd perhaps put this in a small helper function.

> /* ----------
> * pgstat_fetch_stat_dbentry() -
> @@ -2999,6 +3036,14 @@ pgstat_shutdown_hook(int code, Datum arg)
> {
> Assert(!pgstat_is_shutdown);
>
> + /*
> + * Only need to send stats on IO Ops for IO Paths when a process exits, as
> + * pg_stat_get_buffers() will read from live backends' PgBackendStatus and
> + * then sum this with totals from exited backends persisted by the stats
> + * collector.
> + */
> + pgstat_send_buffers();
> +
> /*
> * If we got as far as discovering our own database ID, we can report what
> * we did to the collector. Otherwise, we'd be sending an invalid
> @@ -3092,6 +3137,30 @@ pgstat_send(void *msg, int len)
> #endif
> }

I think it might be nicer to move pgstat_beshutdown_hook() to be a
before_shmem_exit(), and do this in there.

> +/*
> + * Add live IO Op stats for all IO Paths (e.g. shared, local) to those in the
> + * equivalent stats structure for exited backends. Note that this adds and
> + * doesn't set, so the destination stats structure should be zeroed out by the
> + * caller initially. This would commonly be used to transfer all IO Op stats
> + * for all IO Paths for a particular backend type to the pgstats structure.
> + */

This seems a bit odd. Why not zero it in here? Perhaps it also should be
called something like _sum_ instead of _add_?

> +void
> +pgstat_add_io_path_ops(PgStatIOOps *dest, IOOps *src, int io_path_num_types)
> +{

Why is io_path_num_types a parameter?

> +static void
> +pgstat_recv_io_path_ops(PgStat_MsgIOPathOps *msg, int len)
> +{
> + int io_path;
> + PgStatIOOps *src_io_path_ops = msg->iop.io_path_ops;
> + PgStatIOOps *dest_io_path_ops =
> + globalStats.buffers.ops[msg->backend_type].io_path_ops;
> +
> + for (io_path = 0; io_path < IOPATH_NUM_TYPES; io_path++)
> + {
> + PgStatIOOps *src = &src_io_path_ops[io_path];
> + PgStatIOOps *dest = &dest_io_path_ops[io_path];
> +
> + dest->allocs += src->allocs;
> + dest->extends += src->extends;
> + dest->fsyncs += src->fsyncs;
> + dest->writes += src->writes;
> + }
> +}

Could this, with a bit of finessing, use pgstat_add_io_path_ops()?

> --- a/src/backend/storage/buffer/bufmgr.c
> +++ b/src/backend/storage/buffer/bufmgr.c

What about writes originating in like FlushRelationBuffers()?

> bool
> -StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf)
> +StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool *from_ring)
> {
> + /*
> + * If we decide to use the dirty buffer selected by StrategyGetBuffer(),
> + * then ensure that we count it as such in pg_stat_buffers view.
> + */
> + *from_ring = true;
> +

Absolutely minor nitpick: Somehow it feelsoff to talk about the view here.

> +PgBackendStatus *
> +pgstat_fetch_backend_statuses(void)
> +{
> + return BackendStatusArray;
> +}

Hm, not sure this adds much?

> + /*
> + * Subtract 1 from backend_type to avoid having rows for B_INVALID
> + * BackendType
> + */
> + int rownum = (beentry->st_backendType - 1) * IOPATH_NUM_TYPES + io_path;

Perhaps worth wrapping this in a macro or inline function? It's repeated and nontrivial.

> + /* Add stats from all exited backends */
> + backend_io_path_ops = pgstat_fetch_exited_backend_buffers();

It's probably *not* worth it, but I do wonder if we should do the addition on the SQL
level, and actually have two functions, one returning data for exited
backends, and one for currently connected ones.

> +static inline void
> +pgstat_inc_ioop(IOOp io_op, IOPath io_path)
> +{
> + IOOps *io_ops;
> + PgBackendStatus *beentry = MyBEEntry;
> +
> + Assert(beentry);
> +
> + io_ops = &beentry->io_path_stats[io_path];
> + switch (io_op)
> + {
> + case IOOP_ALLOC:
> + pg_atomic_write_u64(&io_ops->allocs,
> + pg_atomic_read_u64(&io_ops->allocs) + 1);
> + break;
> + case IOOP_EXTEND:
> + pg_atomic_write_u64(&io_ops->extends,
> + pg_atomic_read_u64(&io_ops->extends) + 1);
> + break;
> + case IOOP_FSYNC:
> + pg_atomic_write_u64(&io_ops->fsyncs,
> + pg_atomic_read_u64(&io_ops->fsyncs) + 1);
> + break;
> + case IOOP_WRITE:
> + pg_atomic_write_u64(&io_ops->writes,
> + pg_atomic_read_u64(&io_ops->writes) + 1);
> + break;
> + }
> +}

IIRC Thomas Munro had a patch adding a nonatomic_add or such
somewhere. Perhaps in the recovery readahead thread? Might be worth using
instead?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Max Shore 2021-10-08 18:00:23 ERROR: unexpected duplicate for tablespace 16389, relfilenode 484036
Previous Message Lily Liu 2021-10-08 17:24:33 Query rewrite(optimization) using constraints