Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Lukas Fittl <lukas(at)fittl(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Date: 2021-12-15 21:40:27
Message-ID: CAAKRu_bwAmyQ=3SfxG3Jb8rRUBjoMwJtT99D6gG5fqqLmVpCQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

v18 attached.

On Thu, Dec 9, 2021 at 2:17 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2021-12-03 15:02:24 -0500, Melanie Plageman wrote:
> > From e0f7f3dfd60a68fa01f3c023bcdb69305ade3738 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Mon, 11 Oct 2021 16:15:06 -0400
> > Subject: [PATCH v17 1/7] Read-only atomic backend write function
> >
> > For counters in shared memory which can be read by any backend but only
> > written to by one backend, an atomic is still needed to protect against
> > torn values, however, pg_atomic_fetch_add_u64() is overkill for
> > incrementing the counter. pg_atomic_inc_counter() is a helper function
> > which can be used to increment these values safely but without
> > unnecessary overhead.
> >
> > Author: Thomas Munro
> > ---
> > src/include/port/atomics.h | 11 +++++++++++
> > 1 file changed, 11 insertions(+)
> >
> > diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h
> > index 856338f161..39ffff24dd 100644
> > --- a/src/include/port/atomics.h
> > +++ b/src/include/port/atomics.h
> > @@ -519,6 +519,17 @@ pg_atomic_sub_fetch_u64(volatile pg_atomic_uint64 *ptr, int64 sub_)
> > return pg_atomic_sub_fetch_u64_impl(ptr, sub_);
> > }
> >
> > +/*
> > + * On modern systems this is really just *counter++. On some older systems
> > + * there might be more to it, due to inability to read and write 64 bit values
> > + * atomically.
> > + */
> > +static inline void
> > +pg_atomic_inc_counter(pg_atomic_uint64 *counter)
> > +{
> > + pg_atomic_write_u64(counter, pg_atomic_read_u64(counter) + 1);
> > +}
>
> I wonder if it's worth putting something in the name indicating that this is
> not actual atomic RMW operation. Perhaps adding _unlocked?
>

Done.

>
> > From b0e193cfa08f0b8cf1be929f26fe38f06a39aeae Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Wed, 24 Nov 2021 10:32:56 -0500
> > Subject: [PATCH v17 2/7] Add IO operation counters to PgBackendStatus
> >
> > Add an array of counters in PgBackendStatus which count the buffers
> > allocated, extended, fsynced, and written by a given backend. Each "IO
> > Op" (alloc, fsync, extend, write) is counted per "IO Path" (direct,
> > local, shared, or strategy). "local" and "shared" IO Path counters count
> > operations on local and shared buffers. The "strategy" IO Path counts
> > buffers alloc'd/written/read/fsync'd as part of a BufferAccessStrategy.
> > The "direct" IO Path counts blocks of IO which are read, written, or
> > fsync'd using smgrwrite/extend/immedsync directly (as opposed to through
> > [Local]BufferAlloc()).
> >
> > With this commit, all backends increment a counter in their
> > PgBackendStatus when performing an IO operation. This is in preparation
> > for future commits which will persist these stats upon backend exit and
> > use the counters to provide observability of database IO operations.
> >
> > Note that this commit does not add code to increment the "direct" path.
> > A separate proposed patch [1] which would add wrappers for smgrwrite(),
> > smgrextend(), and smgrimmedsync() would provide a good location to call
> > pgstat_inc_ioop() for unbuffered IO and avoid regressions for future
> > users of these functions.
> >
> > [1] https://www.postgresql.org/message-id/CAAKRu_aw72w70X1P%3Dba20K8iGUvSkyz7Yk03wPPh3f9WgmcJ3g%40mail.gmail.com
>
> On longer thread it's nice for committers to already have Reviewed-By: in the
> commit message.

Done.

> > diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
> > index 7229598822..413cc605f8 100644
> > --- a/src/backend/utils/activity/backend_status.c
> > +++ b/src/backend/utils/activity/backend_status.c
> > @@ -399,6 +399,15 @@ pgstat_bestart(void)
> > lbeentry.st_progress_command = PROGRESS_COMMAND_INVALID;
> > lbeentry.st_progress_command_target = InvalidOid;
> > lbeentry.st_query_id = UINT64CONST(0);
> > + for (int io_path = 0; io_path < IOPATH_NUM_TYPES; io_path++)
> > + {
> > + IOOps *io_ops = &lbeentry.io_path_stats[io_path];
> > +
> > + pg_atomic_init_u64(&io_ops->allocs, 0);
> > + pg_atomic_init_u64(&io_ops->extends, 0);
> > + pg_atomic_init_u64(&io_ops->fsyncs, 0);
> > + pg_atomic_init_u64(&io_ops->writes, 0);
> > + }
> >
> > /*
> > * we don't zero st_progress_param here to save cycles; nobody should
>
> nit: I think we nearly always have a blank line before loops

Done.

> > diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
> > index 646126edee..93f1b4bcfc 100644
> > --- a/src/backend/utils/init/postinit.c
> > +++ b/src/backend/utils/init/postinit.c
> > @@ -623,6 +623,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
> > RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, ClientCheckTimeoutHandler);
> > }
> >
> > + pgstat_beinit();
> > /*
> > * Initialize local process's access to XLOG.
> > */
>
> nit: same with multi-line comments.

Done.

> > @@ -649,6 +650,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
> > */
> > CreateAuxProcessResourceOwner();
> >
> > + pgstat_bestart();
> > StartupXLOG();
> > /* Release (and warn about) any buffer pins leaked in StartupXLOG */
> > ReleaseAuxProcessResources(true);
> > @@ -676,7 +678,6 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
> > EnablePortalManager();
> >
> > /* Initialize status reporting */
> > - pgstat_beinit();
>
> I'd like to see changes like moving this kind of thing around broken around
> and committed separately. It's much easier to pinpoint breakage if the CF
> breaks after moving just pgstat_beinit() around, rather than when committing
> this considerably larger patch. And reordering subsystem initialization has
> the habit of causing problems...

Done

> > +/* ----------
> > + * IO Stats reporting utility types
> > + * ----------
> > + */
> > +
> > +typedef enum IOOp
> > +{
> > + IOOP_ALLOC,
> > + IOOP_EXTEND,
> > + IOOP_FSYNC,
> > + IOOP_WRITE,
> > +} IOOp;
> > [...]
> > +/*
> > + * Structure for counting all types of IOOps for a live backend.
> > + */
> > +typedef struct IOOps
> > +{
> > + pg_atomic_uint64 allocs;
> > + pg_atomic_uint64 extends;
> > + pg_atomic_uint64 fsyncs;
> > + pg_atomic_uint64 writes;
> > +} IOOps;
>
> To me IOop and IOOps sound to much alike - even though they're really kind of
> separate things. s/IOOps/IOOpCounters/ maybe?

Done.

> > @@ -3152,6 +3156,14 @@ pgstat_shutdown_hook(int code, Datum arg)
> > {
> > Assert(!pgstat_is_shutdown);
> >
> > + /*
> > + * Only need to send stats on IO Ops for IO Paths when a process exits.
> > + * Users requiring IO Ops for both live and exited backends can read from
> > + * live backends' PgBackendStatus and sum this with totals from exited
> > + * backends persisted by the stats collector.
> > + */
> > + pgstat_send_buffers();
>
> Perhaps something like this comment belongs somewhere at the top of the file,
> or in the header, or ...? It's a fairly central design piece, and it's not
> obvious one would need to look in the shutdown hook for it?
>

now in pgstat.h above the declaration of pgstat_send_buffers()

> > +/*
> > + * Before exiting, a backend sends its IO op statistics to the collector so
> > + * that they may be persisted.
> > + */
> > +void
> > +pgstat_send_buffers(void)
> > +{
> > + PgStat_MsgIOPathOps msg;
> > +
> > + PgBackendStatus *beentry = MyBEEntry;
> > +
> > + /*
> > + * Though some backends with type B_INVALID (such as the single-user mode
> > + * process) do initialize and increment IO operations stats, there is no
> > + * spot in the array of IO operations for backends of type B_INVALID. As
> > + * such, do not send these to the stats collector.
> > + */
> > + if (!beentry || beentry->st_backendType == B_INVALID)
> > + return;
>
> Why does single user mode use B_INVALID? That doesn't seem quite right.

I think PgBackendStatus->st_backendType is set from MyBackendType which
isn't set for the single user mode process. What BackendType would you
expect to see?

> > + memset(&msg, 0, sizeof(msg));
> > + msg.backend_type = beentry->st_backendType;
> > +
> > + pgstat_sum_io_path_ops(msg.iop.io_path_ops,
> > + (IOOps *) &beentry->io_path_stats);
> > +
> > + pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_IO_PATH_OPS);
> > + pgstat_send(&msg, sizeof(msg));
> > +}
>
> It seems worth having a path skipping sending the message if there was no IO?

Makes sense. I've updated pgstat_send_buffers() to do a loop after calling
pgstat_sum_io_path_ops() and check if it should skip sending.

I also thought about having pgstat_sum_io_path_ops() return a value to
indicate if everything was 0 -- which could be useful to future callers
potentially?

I didn't do this because I am not sure what the return value would be.
It could be a bool and be true if any IO was done and false if none was
done -- but that doesn't really make sense given the function's name it
would be called like
if (!pgstat_sum_io_path_ops())
return
which I'm not sure is very clear

> > +/*
> > + * Helper function to sum all live IO Op stats for all IO Paths (e.g. shared,
> > + * local) to those in the equivalent stats structure for exited backends. Note
> > + * that this adds and doesn't set, so the destination stats structure should be
> > + * zeroed out by the caller initially. This would commonly be used to transfer
> > + * all IO Op stats for all IO Paths for a particular backend type to the
> > + * pgstats structure.
> > + */
> > +void
> > +pgstat_sum_io_path_ops(PgStatIOOps *dest, IOOps *src)
> > +{
> > + for (int io_path = 0; io_path < IOPATH_NUM_TYPES; io_path++)
> > + {
>
> Sacriligeous, but I find io_path a harder to understand variable name for the
> counter than i (or io_path_off or ...) ;)

I've updated almost all my non-standard loop index variable names.

> > +static void
> > +pgstat_recv_io_path_ops(PgStat_MsgIOPathOps *msg, int len)
> > +{
> > + PgStatIOOps *src_io_path_ops;
> > + PgStatIOOps *dest_io_path_ops;
> > +
> > + /*
> > + * Subtract 1 from message's BackendType to get a valid index into the
> > + * array of IO Ops which does not include an entry for B_INVALID
> > + * BackendType.
> > + */
> > + Assert(msg->backend_type > B_INVALID);
>
> Probably worth also asserting the upper boundary?

Done.

> > From f972ea87270feaed464a74fb6541ac04b4fc7d98 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Wed, 24 Nov 2021 11:39:48 -0500
> > Subject: [PATCH v17 4/7] Add "buffers" to pgstat_reset_shared_counters
> >
> > Backends count IO operations for various IO paths in their PgBackendStatus.
> > Upon exit, they send these counts to the stats collector. Prior to this commit,
> > these IO Ops stats would have been reset when the target was "bgwriter".
> >
> > With this commit, target "bgwriter" no longer will cause the IO operations
> > stats to be reset, and the IO operations stats can be reset with new target,
> > "buffers".
> > ---
> > doc/src/sgml/monitoring.sgml | 2 +-
> > src/backend/postmaster/pgstat.c | 83 +++++++++++++++++++--
> > src/backend/utils/activity/backend_status.c | 29 +++++++
> > src/include/pgstat.h | 8 +-
> > src/include/utils/backend_status.h | 2 +
> > 5 files changed, 117 insertions(+), 7 deletions(-)
> >
> > diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
> > index 62f2a3332b..bda3eef309 100644
> > --- a/doc/src/sgml/monitoring.sgml
> > +++ b/doc/src/sgml/monitoring.sgml
> > @@ -3604,7 +3604,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
> > <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
> > </para>
> > <para>
> > - Time at which these statistics were last reset
> > + Time at which these statistics were last reset.
> > </para></entry>
> > </row>
> > </tbody>
>
> Hm?
>
> Shouldn't this new reset target be documented?

It is in the commit adding the view. I didn't include it in this commit
because the pg_stat_buffers view doesn't exist yet, as of this commit,
and I thought it would be odd to mention it in the docs (in this
commit).
As an aside, I shouldn't have left this correction in this commit. I
moved it now to the other one.

> > +/*
> > + * Helper function to collect and send live backends' current IO operations
> > + * stats counters when a stats reset is initiated so that they may be deducted
> > + * from future totals.
> > + */
> > +static void
> > +pgstat_send_buffers_reset(PgStat_MsgResetsharedcounter *msg)
> > +{
> > + PgStatIOPathOps ops[BACKEND_NUM_TYPES];
> > +
> > + memset(ops, 0, sizeof(ops));
> > + pgstat_report_live_backend_io_path_ops(ops);
> > +
> > + /*
> > + * Iterate through the array of IO Ops for all IO Paths for each
> > + * BackendType. Because the array does not include a spot for BackendType
> > + * B_INVALID, add 1 to the index when setting backend_type so that there is
> > + * no confusion as to the BackendType with which this reset message
> > + * corresponds.
> > + */
> > + for (int backend_type_idx = 0; backend_type_idx < BACKEND_NUM_TYPES; backend_type_idx++)
> > + {
> > + msg->m_backend_resets.backend_type = backend_type_idx + 1;
> > + memcpy(&msg->m_backend_resets.iop, &ops[backend_type_idx],
> > + sizeof(msg->m_backend_resets.iop));
> > + pgstat_send(msg, sizeof(PgStat_MsgResetsharedcounter));
> > + }
> > +}
>
> Probably worth explaining why multiple messages are sent?

Done.

> > @@ -5583,10 +5621,45 @@ pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, int len)
> > {
> > if (msg->m_resettarget == RESET_BGWRITER)
> > {
> > - /* Reset the global, bgwriter and checkpointer statistics for the cluster. */
> > - memset(&globalStats, 0, sizeof(globalStats));
> > + /*
> > + * Reset the global bgwriter and checkpointer statistics for the
> > + * cluster.
> > + */
> > + memset(&globalStats.checkpointer, 0, sizeof(globalStats.checkpointer));
> > + memset(&globalStats.bgwriter, 0, sizeof(globalStats.bgwriter));
> > globalStats.bgwriter.stat_reset_timestamp = GetCurrentTimestamp();
> > }
>
> Oh, is this a live bug?

I don't think it is a bug. globalStats only contained bgwriter and
checkpointer stats and those were all only displayed in
pg_stat_bgwriter(), so memsetting the whole thing seems fine.

> > + /*
> > + * Subtract 1 from the BackendType to arrive at a valid index in the
> > + * array, as it does not contain a spot for B_INVALID BackendType.
> > + */
>
> Instead of repeating a comment about +- 1 in a bunch of places, would it look
> better to have two helper inline functions for this purpose?

Done.

> > +/*
> > +* When adding a new column to the pg_stat_buffers view, add a new enum
> > +* value here above COLUMN_LENGTH.
> > +*/
> > +enum
> > +{
> > + COLUMN_BACKEND_TYPE,
> > + COLUMN_IO_PATH,
> > + COLUMN_ALLOCS,
> > + COLUMN_EXTENDS,
> > + COLUMN_FSYNCS,
> > + COLUMN_WRITES,
> > + COLUMN_RESET_TIME,
> > + COLUMN_LENGTH,
> > +};
>
> COLUMN_LENGTH seems like a fairly generic name...

Changed.

> > From 9f22da9041e1e1fbc0ef003f5f78f4e72274d438 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Wed, 24 Nov 2021 12:20:10 -0500
> > Subject: [PATCH v17 6/7] Remove superfluous bgwriter stats
> >
> > Remove stats from pg_stat_bgwriter which are now more clearly expressed
> > in pg_stat_buffers.
> >
> > TODO:
> > - make pg_stat_checkpointer view and move relevant stats into it
> > - add additional stats to pg_stat_bgwriter
>
> When do you think it makes sense to tackle these wrt committing some of the
> patches?

Well, the new stats are a superset of the old stats (no stats have been
removed that are not represented in the new or old views). So, I don't
see that as a blocker for committing these patches.

Since it is weird that pg_stat_bgwriter had mostly checkpointer stats,
I've edited this commit to rename that view to pg_stat_checkpointer.

I have not made a separate view just for maxwritten_clean (presumably
called pg_stat_bgwriter), but I would not be opposed to doing this if
you thought having a view with a single column isn't a problem (in the
event that we don't get around to adding more bgwriter stats right
away).

I noticed after changing the docs on the "bgwriter" target for
pg_stat_reset_shared to say "checkpointer", that it still said "bgwriter" in
src/backend/po/ko.po
src/backend/po/it.po
...
I presume these are automatically updated with some incantation, but I wasn't
sure what it was nor could I find documentation on this.

> > diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
> > index 6926fc5742..67447f997a 100644
> > --- a/src/backend/storage/buffer/bufmgr.c
> > +++ b/src/backend/storage/buffer/bufmgr.c
> > @@ -2164,7 +2164,6 @@ BufferSync(int flags)
> > if (SyncOneBuffer(buf_id, false, &wb_context) & BUF_WRITTEN)
> > {
> > TRACE_POSTGRESQL_BUFFER_SYNC_WRITTEN(buf_id);
> > - PendingCheckpointerStats.m_buf_written_checkpoints++;
> > num_written++;
> > }
> > }
> > @@ -2273,9 +2272,6 @@ BgBufferSync(WritebackContext *wb_context)
> > */
> > strategy_buf_id = StrategySyncStart(&strategy_passes, &recent_alloc);
> >
> > - /* Report buffer alloc counts to pgstat */
> > - PendingBgWriterStats.m_buf_alloc += recent_alloc;
> > -
> > /*
> > * If we're not running the LRU scan, just stop after doing the stats
> > * stuff. We mark the saved state invalid so that we can recover sanely
> > @@ -2472,8 +2468,6 @@ BgBufferSync(WritebackContext *wb_context)
> > reusable_buffers++;
> > }
> >
> > - PendingBgWriterStats.m_buf_written_clean += num_written;
> > -
>
> Isn't num_written unused now, unless tracepoints are enabled? I'd expect some
> compilers to warn... Perhaps we should just remove information from the
> tracepoint?

The local variable num_written is used in BgBufferSync() to determine
whether or not to increment maxwritten_clean which is still represented
in the view pg_stat_checkpointer (formerly pg_stat_bgwriter).

A local variable num_written is used in BufferSync() to increment
CheckpointStats.ckpt_bufs_written which is logged in LogCheckpointEnd(),
so I'm not sure that can be removed.

- Melanie

Attachment Content-Type Size
v18-0008-small-comment-correction.patch text/x-patch 909 bytes
v18-0005-Add-buffers-to-pgstat_reset_shared_counters.patch text/x-patch 10.0 KB
v18-0007-Remove-superfluous-bgwriter-stats.patch text/x-patch 22.1 KB
v18-0004-Send-IO-operations-to-stats-collector.patch text/x-patch 11.6 KB
v18-0006-Add-system-view-tracking-IO-ops-per-backend-type.patch text/x-patch 17.6 KB
v18-0003-Add-IO-operation-counters-to-PgBackendStatus.patch text/x-patch 16.1 KB
v18-0001-Read-only-atomic-backend-write-function.patch text/x-patch 1.7 KB
v18-0002-Move-backend-pgstat-initialization-earlier.patch text/x-patch 2.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2021-12-15 21:48:09 Re: SQL/JSON: functions
Previous Message Andrew Dunstan 2021-12-15 21:36:26 Re: SQL/JSON: functions