Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Lukas Fittl <lukas(at)fittl(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Date: 2021-12-09 19:17:52
Message-ID: 20211209191752.dupgg4omrc7gck2l@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-12-03 15:02:24 -0500, Melanie Plageman wrote:
> From e0f7f3dfd60a68fa01f3c023bcdb69305ade3738 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Date: Mon, 11 Oct 2021 16:15:06 -0400
> Subject: [PATCH v17 1/7] Read-only atomic backend write function
>
> For counters in shared memory which can be read by any backend but only
> written to by one backend, an atomic is still needed to protect against
> torn values, however, pg_atomic_fetch_add_u64() is overkill for
> incrementing the counter. pg_atomic_inc_counter() is a helper function
> which can be used to increment these values safely but without
> unnecessary overhead.
>
> Author: Thomas Munro
> ---
> src/include/port/atomics.h | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h
> index 856338f161..39ffff24dd 100644
> --- a/src/include/port/atomics.h
> +++ b/src/include/port/atomics.h
> @@ -519,6 +519,17 @@ pg_atomic_sub_fetch_u64(volatile pg_atomic_uint64 *ptr, int64 sub_)
> return pg_atomic_sub_fetch_u64_impl(ptr, sub_);
> }
>
> +/*
> + * On modern systems this is really just *counter++. On some older systems
> + * there might be more to it, due to inability to read and write 64 bit values
> + * atomically.
> + */
> +static inline void
> +pg_atomic_inc_counter(pg_atomic_uint64 *counter)
> +{
> + pg_atomic_write_u64(counter, pg_atomic_read_u64(counter) + 1);
> +}

I wonder if it's worth putting something in the name indicating that this is
not actual atomic RMW operation. Perhaps adding _unlocked?

> From b0e193cfa08f0b8cf1be929f26fe38f06a39aeae Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Date: Wed, 24 Nov 2021 10:32:56 -0500
> Subject: [PATCH v17 2/7] Add IO operation counters to PgBackendStatus
>
> Add an array of counters in PgBackendStatus which count the buffers
> allocated, extended, fsynced, and written by a given backend. Each "IO
> Op" (alloc, fsync, extend, write) is counted per "IO Path" (direct,
> local, shared, or strategy). "local" and "shared" IO Path counters count
> operations on local and shared buffers. The "strategy" IO Path counts
> buffers alloc'd/written/read/fsync'd as part of a BufferAccessStrategy.
> The "direct" IO Path counts blocks of IO which are read, written, or
> fsync'd using smgrwrite/extend/immedsync directly (as opposed to through
> [Local]BufferAlloc()).
>
> With this commit, all backends increment a counter in their
> PgBackendStatus when performing an IO operation. This is in preparation
> for future commits which will persist these stats upon backend exit and
> use the counters to provide observability of database IO operations.
>
> Note that this commit does not add code to increment the "direct" path.
> A separate proposed patch [1] which would add wrappers for smgrwrite(),
> smgrextend(), and smgrimmedsync() would provide a good location to call
> pgstat_inc_ioop() for unbuffered IO and avoid regressions for future
> users of these functions.
>
> [1] https://www.postgresql.org/message-id/CAAKRu_aw72w70X1P%3Dba20K8iGUvSkyz7Yk03wPPh3f9WgmcJ3g%40mail.gmail.com

On longer thread it's nice for committers to already have Reviewed-By: in the
commit message.

> diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
> index 7229598822..413cc605f8 100644
> --- a/src/backend/utils/activity/backend_status.c
> +++ b/src/backend/utils/activity/backend_status.c
> @@ -399,6 +399,15 @@ pgstat_bestart(void)
> lbeentry.st_progress_command = PROGRESS_COMMAND_INVALID;
> lbeentry.st_progress_command_target = InvalidOid;
> lbeentry.st_query_id = UINT64CONST(0);
> + for (int io_path = 0; io_path < IOPATH_NUM_TYPES; io_path++)
> + {
> + IOOps *io_ops = &lbeentry.io_path_stats[io_path];
> +
> + pg_atomic_init_u64(&io_ops->allocs, 0);
> + pg_atomic_init_u64(&io_ops->extends, 0);
> + pg_atomic_init_u64(&io_ops->fsyncs, 0);
> + pg_atomic_init_u64(&io_ops->writes, 0);
> + }
>
> /*
> * we don't zero st_progress_param here to save cycles; nobody should

nit: I think we nearly always have a blank line before loops

> diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
> index 646126edee..93f1b4bcfc 100644
> --- a/src/backend/utils/init/postinit.c
> +++ b/src/backend/utils/init/postinit.c
> @@ -623,6 +623,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
> RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, ClientCheckTimeoutHandler);
> }
>
> + pgstat_beinit();
> /*
> * Initialize local process's access to XLOG.
> */

nit: same with multi-line comments.

> @@ -649,6 +650,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
> */
> CreateAuxProcessResourceOwner();
>
> + pgstat_bestart();
> StartupXLOG();
> /* Release (and warn about) any buffer pins leaked in StartupXLOG */
> ReleaseAuxProcessResources(true);
> @@ -676,7 +678,6 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
> EnablePortalManager();
>
> /* Initialize status reporting */
> - pgstat_beinit();

I'd like to see changes like moving this kind of thing around broken around
and committed separately. It's much easier to pinpoint breakage if the CF
breaks after moving just pgstat_beinit() around, rather than when committing
this considerably larger patch. And reordering subsystem initialization has
the habit of causing problems...

> +/* ----------
> + * IO Stats reporting utility types
> + * ----------
> + */
> +
> +typedef enum IOOp
> +{
> + IOOP_ALLOC,
> + IOOP_EXTEND,
> + IOOP_FSYNC,
> + IOOP_WRITE,
> +} IOOp;
> [...]
> +/*
> + * Structure for counting all types of IOOps for a live backend.
> + */
> +typedef struct IOOps
> +{
> + pg_atomic_uint64 allocs;
> + pg_atomic_uint64 extends;
> + pg_atomic_uint64 fsyncs;
> + pg_atomic_uint64 writes;
> +} IOOps;

To me IOop and IOOps sound to much alike - even though they're really kind of
separate things. s/IOOps/IOOpCounters/ maybe?

> @@ -3152,6 +3156,14 @@ pgstat_shutdown_hook(int code, Datum arg)
> {
> Assert(!pgstat_is_shutdown);
>
> + /*
> + * Only need to send stats on IO Ops for IO Paths when a process exits.
> + * Users requiring IO Ops for both live and exited backends can read from
> + * live backends' PgBackendStatus and sum this with totals from exited
> + * backends persisted by the stats collector.
> + */
> + pgstat_send_buffers();

Perhaps something like this comment belongs somewhere at the top of the file,
or in the header, or ...? It's a fairly central design piece, and it's not
obvious one would need to look in the shutdown hook for it?

> +/*
> + * Before exiting, a backend sends its IO op statistics to the collector so
> + * that they may be persisted.
> + */
> +void
> +pgstat_send_buffers(void)
> +{
> + PgStat_MsgIOPathOps msg;
> +
> + PgBackendStatus *beentry = MyBEEntry;
> +
> + /*
> + * Though some backends with type B_INVALID (such as the single-user mode
> + * process) do initialize and increment IO operations stats, there is no
> + * spot in the array of IO operations for backends of type B_INVALID. As
> + * such, do not send these to the stats collector.
> + */
> + if (!beentry || beentry->st_backendType == B_INVALID)
> + return;

Why does single user mode use B_INVALID? That doesn't seem quite right.

> + memset(&msg, 0, sizeof(msg));
> + msg.backend_type = beentry->st_backendType;
> +
> + pgstat_sum_io_path_ops(msg.iop.io_path_ops,
> + (IOOps *) &beentry->io_path_stats);
> +
> + pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_IO_PATH_OPS);
> + pgstat_send(&msg, sizeof(msg));
> +}

It seems worth having a path skipping sending the message if there was no IO?

> +/*
> + * Helper function to sum all live IO Op stats for all IO Paths (e.g. shared,
> + * local) to those in the equivalent stats structure for exited backends. Note
> + * that this adds and doesn't set, so the destination stats structure should be
> + * zeroed out by the caller initially. This would commonly be used to transfer
> + * all IO Op stats for all IO Paths for a particular backend type to the
> + * pgstats structure.
> + */
> +void
> +pgstat_sum_io_path_ops(PgStatIOOps *dest, IOOps *src)
> +{
> + for (int io_path = 0; io_path < IOPATH_NUM_TYPES; io_path++)
> + {

Sacriligeous, but I find io_path a harder to understand variable name for the
counter than i (or io_path_off or ...) ;)

> +static void
> +pgstat_recv_io_path_ops(PgStat_MsgIOPathOps *msg, int len)
> +{
> + PgStatIOOps *src_io_path_ops;
> + PgStatIOOps *dest_io_path_ops;
> +
> + /*
> + * Subtract 1 from message's BackendType to get a valid index into the
> + * array of IO Ops which does not include an entry for B_INVALID
> + * BackendType.
> + */
> + Assert(msg->backend_type > B_INVALID);

Probably worth also asserting the upper boundary?

> From f972ea87270feaed464a74fb6541ac04b4fc7d98 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Date: Wed, 24 Nov 2021 11:39:48 -0500
> Subject: [PATCH v17 4/7] Add "buffers" to pgstat_reset_shared_counters
>
> Backends count IO operations for various IO paths in their PgBackendStatus.
> Upon exit, they send these counts to the stats collector. Prior to this commit,
> these IO Ops stats would have been reset when the target was "bgwriter".
>
> With this commit, target "bgwriter" no longer will cause the IO operations
> stats to be reset, and the IO operations stats can be reset with new target,
> "buffers".
> ---
> doc/src/sgml/monitoring.sgml | 2 +-
> src/backend/postmaster/pgstat.c | 83 +++++++++++++++++++--
> src/backend/utils/activity/backend_status.c | 29 +++++++
> src/include/pgstat.h | 8 +-
> src/include/utils/backend_status.h | 2 +
> 5 files changed, 117 insertions(+), 7 deletions(-)
>
> diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
> index 62f2a3332b..bda3eef309 100644
> --- a/doc/src/sgml/monitoring.sgml
> +++ b/doc/src/sgml/monitoring.sgml
> @@ -3604,7 +3604,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
> <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
> </para>
> <para>
> - Time at which these statistics were last reset
> + Time at which these statistics were last reset.
> </para></entry>
> </row>
> </tbody>

Hm?

Shouldn't this new reset target be documented?

> +/*
> + * Helper function to collect and send live backends' current IO operations
> + * stats counters when a stats reset is initiated so that they may be deducted
> + * from future totals.
> + */
> +static void
> +pgstat_send_buffers_reset(PgStat_MsgResetsharedcounter *msg)
> +{
> + PgStatIOPathOps ops[BACKEND_NUM_TYPES];
> +
> + memset(ops, 0, sizeof(ops));
> + pgstat_report_live_backend_io_path_ops(ops);
> +
> + /*
> + * Iterate through the array of IO Ops for all IO Paths for each
> + * BackendType. Because the array does not include a spot for BackendType
> + * B_INVALID, add 1 to the index when setting backend_type so that there is
> + * no confusion as to the BackendType with which this reset message
> + * corresponds.
> + */
> + for (int backend_type_idx = 0; backend_type_idx < BACKEND_NUM_TYPES; backend_type_idx++)
> + {
> + msg->m_backend_resets.backend_type = backend_type_idx + 1;
> + memcpy(&msg->m_backend_resets.iop, &ops[backend_type_idx],
> + sizeof(msg->m_backend_resets.iop));
> + pgstat_send(msg, sizeof(PgStat_MsgResetsharedcounter));
> + }
> +}

Probably worth explaining why multiple messages are sent?

> @@ -5583,10 +5621,45 @@ pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, int len)
> {
> if (msg->m_resettarget == RESET_BGWRITER)
> {
> - /* Reset the global, bgwriter and checkpointer statistics for the cluster. */
> - memset(&globalStats, 0, sizeof(globalStats));
> + /*
> + * Reset the global bgwriter and checkpointer statistics for the
> + * cluster.
> + */
> + memset(&globalStats.checkpointer, 0, sizeof(globalStats.checkpointer));
> + memset(&globalStats.bgwriter, 0, sizeof(globalStats.bgwriter));
> globalStats.bgwriter.stat_reset_timestamp = GetCurrentTimestamp();
> }

Oh, is this a live bug?

> + /*
> + * Subtract 1 from the BackendType to arrive at a valid index in the
> + * array, as it does not contain a spot for B_INVALID BackendType.
> + */

Instead of repeating a comment about +- 1 in a bunch of places, would it look
better to have two helper inline functions for this purpose?

> +/*
> +* When adding a new column to the pg_stat_buffers view, add a new enum
> +* value here above COLUMN_LENGTH.
> +*/
> +enum
> +{
> + COLUMN_BACKEND_TYPE,
> + COLUMN_IO_PATH,
> + COLUMN_ALLOCS,
> + COLUMN_EXTENDS,
> + COLUMN_FSYNCS,
> + COLUMN_WRITES,
> + COLUMN_RESET_TIME,
> + COLUMN_LENGTH,
> +};

COLUMN_LENGTH seems like a fairly generic name...

> From 9f22da9041e1e1fbc0ef003f5f78f4e72274d438 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Date: Wed, 24 Nov 2021 12:20:10 -0500
> Subject: [PATCH v17 6/7] Remove superfluous bgwriter stats
>
> Remove stats from pg_stat_bgwriter which are now more clearly expressed
> in pg_stat_buffers.
>
> TODO:
> - make pg_stat_checkpointer view and move relevant stats into it
> - add additional stats to pg_stat_bgwriter

When do you think it makes sense to tackle these wrt committing some of the
patches?

> diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
> index 6926fc5742..67447f997a 100644
> --- a/src/backend/storage/buffer/bufmgr.c
> +++ b/src/backend/storage/buffer/bufmgr.c
> @@ -2164,7 +2164,6 @@ BufferSync(int flags)
> if (SyncOneBuffer(buf_id, false, &wb_context) & BUF_WRITTEN)
> {
> TRACE_POSTGRESQL_BUFFER_SYNC_WRITTEN(buf_id);
> - PendingCheckpointerStats.m_buf_written_checkpoints++;
> num_written++;
> }
> }
> @@ -2273,9 +2272,6 @@ BgBufferSync(WritebackContext *wb_context)
> */
> strategy_buf_id = StrategySyncStart(&strategy_passes, &recent_alloc);
>
> - /* Report buffer alloc counts to pgstat */
> - PendingBgWriterStats.m_buf_alloc += recent_alloc;
> -
> /*
> * If we're not running the LRU scan, just stop after doing the stats
> * stuff. We mark the saved state invalid so that we can recover sanely
> @@ -2472,8 +2468,6 @@ BgBufferSync(WritebackContext *wb_context)
> reusable_buffers++;
> }
>
> - PendingBgWriterStats.m_buf_written_clean += num_written;
> -

Isn't num_written unused now, unless tracepoints are enabled? I'd expect some
compilers to warn... Perhaps we should just remove information from the
tracepoint?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2021-12-09 19:28:18 do only critical work during single-user vacuum?
Previous Message Andres Freund 2021-12-09 18:41:50 Re: port conflicts when running tests concurrently on windows.