Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Lukas Fittl <lukas(at)fittl(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Date: 2021-10-11 20:48:01
Message-ID: CAAKRu_YJMHJ66-ZB86zFCH=Cq4+5w2gnJ14bu8oyZec0C39Dcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 8, 2021 at 1:56 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2021-10-01 16:05:31 -0400, Melanie Plageman wrote:
> > From 40c809ad1127322f3462e85be080c10534485f0d Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Fri, 24 Sep 2021 17:39:12 -0400
> > Subject: [PATCH v13 1/4] Allow bootstrap process to beinit
> >
> > ---
> > src/backend/utils/init/postinit.c | 3 +--
> > 1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
> > index 78bc64671e..fba5864172 100644
> > --- a/src/backend/utils/init/postinit.c
> > +++ b/src/backend/utils/init/postinit.c
> > @@ -670,8 +670,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
> > EnablePortalManager();
> >
> > /* Initialize status reporting */
> > - if (!bootstrap)
> > - pgstat_beinit();
> > + pgstat_beinit();
> >
> > /*
> > * Load relcache entries for the shared system catalogs. This must create
> > --
> > 2.27.0
> >
>
> I think it's good to remove more and more of these !bootstrap cases - they
> really make it harder to understand the state of the system at various
> points. Optimizing for the rarely executed bootstrap mode at the cost of
> checks in very common codepaths...

What scope do you suggest for this patch set? A single patch which does
this in more locations (remove !bootstrap) or should I remove this patch
from the patchset?

>
>
>
> > From a709ddb30b2b747beb214f0b13cd1e1816094e6b Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Thu, 30 Sep 2021 16:16:22 -0400
> > Subject: [PATCH v13 2/4] Add utility to make tuplestores for pg stat views
> >
> > Most of the steps to make a tuplestore for those pg_stat views requiring
> > one are the same. Consolidate them into a single helper function for
> > clarity and to avoid bugs.
> > ---
> > src/backend/utils/adt/pgstatfuncs.c | 129 ++++++++++------------------
> > 1 file changed, 44 insertions(+), 85 deletions(-)
> >
> > diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
> > index ff5aedc99c..513f5aecf6 100644
> > --- a/src/backend/utils/adt/pgstatfuncs.c
> > +++ b/src/backend/utils/adt/pgstatfuncs.c
> > @@ -36,6 +36,42 @@
> >
> > #define HAS_PGSTAT_PERMISSIONS(role) (is_member_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS) || has_privs_of_role(GetUserId(), role))
> >
> > +/*
> > + * Helper function for views with multiple rows constructed from a tuplestore
> > + */
> > +static Tuplestorestate *
> > +pg_stat_make_tuplestore(FunctionCallInfo fcinfo, TupleDesc *tupdesc)
> > +{
> > + Tuplestorestate *tupstore;
> > + ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
> > + MemoryContext per_query_ctx;
> > + MemoryContext oldcontext;
> > +
> > + /* check to see if caller supports us returning a tuplestore */
> > + if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
> > + ereport(ERROR,
> > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > + errmsg("set-valued function called in context that cannot accept a set")));
> > + if (!(rsinfo->allowedModes & SFRM_Materialize))
> > + ereport(ERROR,
> > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > + errmsg("materialize mode required, but it is not allowed in this context")));
> > +
> > + /* Build a tuple descriptor for our result type */
> > + if (get_call_result_type(fcinfo, NULL, tupdesc) != TYPEFUNC_COMPOSITE)
> > + elog(ERROR, "return type must be a row type");
> > +
> > + per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
> > + oldcontext = MemoryContextSwitchTo(per_query_ctx);
> > +
> > + tupstore = tuplestore_begin_heap(true, false, work_mem);
> > + rsinfo->returnMode = SFRM_Materialize;
> > + rsinfo->setResult = tupstore;
> > + rsinfo->setDesc = *tupdesc;
> > + MemoryContextSwitchTo(oldcontext);
> > + return tupstore;
> > +}
>
> Is pgstattuple the best place for this helper? It's not really pgstatfuncs
> specific...
>
> It also looks vaguely familiar - I wonder if we have a helper roughly like
> this somewhere else already...
>

I don't see a function which is specifically a utility to make a
tuplestore. Looking at the callers of tuplestore_begin_heap(), I notice
very similar code to the function I added in pg_tablespace_databases()
in utils/adt/misc.c, pg_stop_backup_v2() in xlogfuncs.c,
pg_event_trigger_dropped_objects() and pg_event_trigger_ddl_commands in
event_tigger.c, pg_available_extensions in extension.c, etc.

Do you think it makes sense to refactor this code out of all of these
places? If so, where would such a utility function belong?

>
>
> > From e9a5d2a021d429fdbb2daa58ab9d75a069f334d4 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Wed, 29 Sep 2021 15:39:45 -0400
> > Subject: [PATCH v13 3/4] Add system view tracking IO ops per backend type
> >
>
> > diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
> > index be7366379d..0d18e7f71a 100644
> > --- a/src/backend/postmaster/checkpointer.c
> > +++ b/src/backend/postmaster/checkpointer.c
> > @@ -1104,6 +1104,7 @@ ForwardSyncRequest(const FileTag *ftag, SyncRequestType type)
> > */
> > if (!AmBackgroundWriterProcess())
> > CheckpointerShmem->num_backend_fsync++;
> > + pgstat_inc_ioop(IOOP_FSYNC, IOPATH_SHARED);
> > LWLockRelease(CheckpointerCommLock);
> > return false;
> > }
>
> ISTM this doens't need to happen while holding CheckpointerCommLock?
>

Fixed in attached updates. I only attached the diff from my previous version.

>
>
> > @@ -1461,7 +1467,25 @@ pgstat_reset_shared_counters(const char *target)
> > errhint("Target must be \"archiver\", \"bgwriter\", or \"wal\".")));
> >
> > pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETSHAREDCOUNTER);
> > - pgstat_send(&msg, sizeof(msg));
> > +
> > + if (msg.m_resettarget == RESET_BUFFERS)
> > + {
> > + int backend_type;
> > + PgStatIOPathOps ops[BACKEND_NUM_TYPES];
> > +
> > + memset(ops, 0, sizeof(ops));
> > + pgstat_report_live_backend_io_path_ops(ops);
> > +
> > + for (backend_type = 1; backend_type < BACKEND_NUM_TYPES; backend_type++)
> > + {
> > + msg.m_backend_resets.backend_type = backend_type;
> > + memcpy(&msg.m_backend_resets.iop, &ops[backend_type], sizeof(msg.m_backend_resets.iop));
> > + pgstat_send(&msg, sizeof(msg));
> > + }
> > + }
> > + else
> > + pgstat_send(&msg, sizeof(msg));
> > +
> > }
>
> I'd perhaps put this in a small helper function.
>

Done.

>
> > /* ----------
> > * pgstat_fetch_stat_dbentry() -
> > @@ -2999,6 +3036,14 @@ pgstat_shutdown_hook(int code, Datum arg)
> > {
> > Assert(!pgstat_is_shutdown);
> >
> > + /*
> > + * Only need to send stats on IO Ops for IO Paths when a process exits, as
> > + * pg_stat_get_buffers() will read from live backends' PgBackendStatus and
> > + * then sum this with totals from exited backends persisted by the stats
> > + * collector.
> > + */
> > + pgstat_send_buffers();
> > +
> > /*
> > * If we got as far as discovering our own database ID, we can report what
> > * we did to the collector. Otherwise, we'd be sending an invalid
> > @@ -3092,6 +3137,30 @@ pgstat_send(void *msg, int len)
> > #endif
> > }
>
> I think it might be nicer to move pgstat_beshutdown_hook() to be a
> before_shmem_exit(), and do this in there.
>

Not really sure the correct way to do this. A cursory attempt to do so
failed because ShutdownXLOG() is also registered as a
before_shmem_exit() and ends up being called after
pgstat_beshutdown_hook(). pgstat_beshutdown_hook() zeroes out
PgBackendStatus, ShutdownXLOG() initiates a checkpoint, and during a
checkpoint, the checkpointer increments IO op counter for writes in its
PgBackendStatus.

>
> > +/*
> > + * Add live IO Op stats for all IO Paths (e.g. shared, local) to those in the
> > + * equivalent stats structure for exited backends. Note that this adds and
> > + * doesn't set, so the destination stats structure should be zeroed out by the
> > + * caller initially. This would commonly be used to transfer all IO Op stats
> > + * for all IO Paths for a particular backend type to the pgstats structure.
> > + */
>
> This seems a bit odd. Why not zero it in here? Perhaps it also should be
> called something like _sum_ instead of _add_?
>

I wanted to be able to use the function both when it was setting the
values and when it needed to add to the values (which are the two
current callers). I have changed the name from add -> sum.

>
> > +void
> > +pgstat_add_io_path_ops(PgStatIOOps *dest, IOOps *src, int io_path_num_types)
> > +{
>
> Why is io_path_num_types a parameter?
>

I imagined that maybe another caller would want to only add some IO path
types and still use the function, but I think it is more confusing than
anything else so I've changed it.

>
> > +static void
> > +pgstat_recv_io_path_ops(PgStat_MsgIOPathOps *msg, int len)
> > +{
> > + int io_path;
> > + PgStatIOOps *src_io_path_ops = msg->iop.io_path_ops;
> > + PgStatIOOps *dest_io_path_ops =
> > + globalStats.buffers.ops[msg->backend_type].io_path_ops;
> > +
> > + for (io_path = 0; io_path < IOPATH_NUM_TYPES; io_path++)
> > + {
> > + PgStatIOOps *src = &src_io_path_ops[io_path];
> > + PgStatIOOps *dest = &dest_io_path_ops[io_path];
> > +
> > + dest->allocs += src->allocs;
> > + dest->extends += src->extends;
> > + dest->fsyncs += src->fsyncs;
> > + dest->writes += src->writes;
> > + }
> > +}
>
> Could this, with a bit of finessing, use pgstat_add_io_path_ops()?
>

I didn't really see a good way to do this -- given that
pgstat_add_io_path_ops() adds IOOps members to PgStatIOOps members --
which requires a pg_atomic_read_u64() and pgstat_recv_io_path_ops adds
PgStatIOOps to PgStatIOOps which doesn't require pg_atomic_read_u64().
Maybe I could pass a flag which, based on the type, either does or
doesn't use pg_atomic_read_u64 to access the value? But that seems worse
to me.

>
> > --- a/src/backend/storage/buffer/bufmgr.c
> > +++ b/src/backend/storage/buffer/bufmgr.c
>
> What about writes originating in like FlushRelationBuffers()?
>

Yes, I have made IOPath a parameter to FlushBuffer() so that it can
distinguish between strategy buffer writes and shared buffer writes and
then pushed pgstat_inc_ioop() into FlushBuffer().

>
> > bool
> > -StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf)
> > +StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool *from_ring)
> > {
> > + /*
> > + * If we decide to use the dirty buffer selected by StrategyGetBuffer(),
> > + * then ensure that we count it as such in pg_stat_buffers view.
> > + */
> > + *from_ring = true;
> > +
>
> Absolutely minor nitpick: Somehow it feelsoff to talk about the view here.

Fixed.

>
>
> > +PgBackendStatus *
> > +pgstat_fetch_backend_statuses(void)
> > +{
> > + return BackendStatusArray;
> > +}
>
> Hm, not sure this adds much?

Is there a better way to access the whole BackendStatusArray from within
pgstatfuncs.c?

>
>
> > + /*
> > + * Subtract 1 from backend_type to avoid having rows for B_INVALID
> > + * BackendType
> > + */
> > + int rownum = (beentry->st_backendType - 1) * IOPATH_NUM_TYPES + io_path;
>
>
> Perhaps worth wrapping this in a macro or inline function? It's repeated and nontrivial.
>

Done.

>
> > + /* Add stats from all exited backends */
> > + backend_io_path_ops = pgstat_fetch_exited_backend_buffers();
>
> It's probably *not* worth it, but I do wonder if we should do the addition on the SQL
> level, and actually have two functions, one returning data for exited
> backends, and one for currently connected ones.
>

It would be easy enough to implement. I would defer to others on whether
or not this would be useful. My use case for pg_stat_buffers() is to see
what backends' IO during a benchmark or test workload. For that, I reset
the stats before and then query pg_stat_buffers after running the
benchmark. I don't know if I would use exited and live stats
individually. In a real workload, I could see using
pg_stat_buffers live and exited to see if the workload causing lots of
backends to do their own writes is ongoing. Though a given workload may
be composed of lots of different queries, with backends exiting
throughout.

>
> > +static inline void
> > +pgstat_inc_ioop(IOOp io_op, IOPath io_path)
> > +{
> > + IOOps *io_ops;
> > + PgBackendStatus *beentry = MyBEEntry;
> > +
> > + Assert(beentry);
> > +
> > + io_ops = &beentry->io_path_stats[io_path];
> > + switch (io_op)
> > + {
> > + case IOOP_ALLOC:
> > + pg_atomic_write_u64(&io_ops->allocs,
> > + pg_atomic_read_u64(&io_ops->allocs) + 1);
> > + break;
> > + case IOOP_EXTEND:
> > + pg_atomic_write_u64(&io_ops->extends,
> > + pg_atomic_read_u64(&io_ops->extends) + 1);
> > + break;
> > + case IOOP_FSYNC:
> > + pg_atomic_write_u64(&io_ops->fsyncs,
> > + pg_atomic_read_u64(&io_ops->fsyncs) + 1);
> > + break;
> > + case IOOP_WRITE:
> > + pg_atomic_write_u64(&io_ops->writes,
> > + pg_atomic_read_u64(&io_ops->writes) + 1);
> > + break;
> > + }
> > +}
>
> IIRC Thomas Munro had a patch adding a nonatomic_add or such
> somewhere. Perhaps in the recovery readahead thread? Might be worth using
> instead?
>

I've added Thomas' function in a separate commit. I looked for a better
place to add it (I was thinking somewhere in src/backend/utils/misc) but
couldn't find anywhere that made sense.

I also added a call to pgstat_inc_ioop() in ProcessSyncRequests() to capture
when the checkpointer does fsyncs.

I also added pgstat_inc_ioop() calls to callers of smgrwrite() flushing local
buffers. I don't know if that is desirable or not in this patch. They could be
removed if wrappers for smgrwrite() go in and pgstat_inc_ioop() can be called
from within those wrappers.

- Melanie

Attachment Content-Type Size
0002-updates.patch text/x-patch 17.5 KB
0001-Read-only-atomic-s-backend-write-function.patch text/x-patch 1.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-10-11 20:54:27 Re: pgsql: Adjust configure to insist on Perl version >= 5.8.3.
Previous Message Mark Dilger 2021-10-11 20:20:26 Re: BUG #17212: pg_amcheck fails on checking temporary relations