Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Lukas Fittl <lukas(at)fittl(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Date: 2022-08-25 19:15:27
Message-ID: 20220825191527.ki6wggnawlrl53i6@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-08-22 13:15:18 -0400, Melanie Plageman wrote:
> v28 attached.
>
> I've added the new structs I added to typedefs.list.
>
> I've split the commit which adds all of the logic to track
> IO operation statistics into two commits -- one which includes all of
> the code to count IOOps for IOContexts locally in a backend and a second
> which includes all of the code to accumulate and manage these with the
> cumulative stats system.

Thanks!

> A few notes about the commit which adds local IO Operation stats:
>
> - There is a comment above pgstat_io_op_stats_collected() which mentions
> the cumulative stats system even though this commit doesn't engage the
> cumulative stats system. I wasn't sure if it was more or less
> confusing to have two different versions of this comment.

Not worth being worried about...

> - should pgstat_count_io_op() take BackendType as a parameter instead of
> using MyBackendType internally?

I don't forsee a case where a different value would be passed in.

> - pgstat_count_io_op() Assert()s that the passed-in IOOp and IOContext
> are valid for this BackendType, but it doesn't check that all of the
> pending stats which should be zero are zero. I thought this was okay
> because if I did add that zero-check, it would be added to
> pgstat_count_ioop() as well, and we already Assert() there that we can
> count the op. Thus, it doesn't seem like checking that the stats are
> zero would add any additional regression protection.

It's probably ok.

> - I've kept pgstat_io_context_desc() and pgstat_io_op_desc() in the
> commit which adds those types (the local stats commit), however they
> are not used in that commit. I wasn't sure if I should keep them in
> that commit or move them to the first commit using them (the commit
> adding the new view).

> - I've left pgstat_fetch_backend_io_context_ops() in the shared stats
> commit, however it is not used until the commit which adds the view in
> pg_stat_get_io(). I wasn't sure which way seemed better.

Think that's fine.

> Notes on the commit which accumulates IO Operation stats in shared
> memory:
>
> - I've extended the usage of the Assert()s that IO Operation stats that
> should be zero are. Previously we only checked the stats validity when
> querying the view. Now we check it when flushing pending stats and
> when reading the stats file into shared memory.

> Note that the three locations with these validity checks (when
> flushing pending stats, when reading stats file into shared memory,
> and when querying the view) have similar looking code to loop through
> and validate the stats. However, the actual action they perform if the
> stats are valid is different for each site (adding counters together,
> doing a read, setting nulls in a tuple column to true). Also, some of
> these instances have other code interspersed in the loops which would
> require additional looping if separated from this logic. So it was
> difficult to see a way of combining these into a single helper
> function.

All of them seem to repeat something like

> + if (!pgstat_bktype_io_op_valid(bktype, io_op) ||
> + !pgstat_io_context_io_op_valid(io_context, io_op))

perhaps those could be combined? Afaics nothing uses pgstat_bktype_io_op_valid
separately.

> Subject: [PATCH v28 3/5] Track IO operation statistics locally
>
> Introduce "IOOp", an IO operation done by a backend, and "IOContext",
> the IO location source or target or IO type done by a backend. For
> example, the checkpointer may write a shared buffer out. This would be
> counted as an IOOp "write" on an IOContext IOCONTEXT_SHARED by
> BackendType "checkpointer".
>
> Each IOOp (alloc, extend, fsync, read, write) is counted per IOContext
> (local, shared, or strategy) through a call to pgstat_count_io_op().
>
> The primary concern of these statistics is IO operations on data blocks
> during the course of normal database operations. IO done by, for
> example, the archiver or syslogger is not counted in these statistics.

s/is/are/?

> Stats on IOOps for all IOContexts for a backend are counted in a
> backend's local memory. This commit does not expose any functions for
> aggregating or viewing these stats.

s/This commit does not/A subsequent commit will expose/...

> @@ -823,6 +823,7 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
> BufferDesc *bufHdr;
> Block bufBlock;
> bool found;
> + IOContext io_context;
> bool isExtend;
> bool isLocalBuf = SmgrIsTemp(smgr);
>
> @@ -986,10 +987,25 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
> */
> Assert(!(pg_atomic_read_u32(&bufHdr->state) & BM_VALID)); /* spinlock not needed */
>
> - bufBlock = isLocalBuf ? LocalBufHdrGetBlock(bufHdr) : BufHdrGetBlock(bufHdr);
> + if (isLocalBuf)
> + {
> + bufBlock = LocalBufHdrGetBlock(bufHdr);
> + io_context = IOCONTEXT_LOCAL;
> + }
> + else
> + {
> + bufBlock = BufHdrGetBlock(bufHdr);
> +
> + if (strategy != NULL)
> + io_context = IOCONTEXT_STRATEGY;
> + else
> + io_context = IOCONTEXT_SHARED;
> + }

There's a isLocalBuf block earlier on, couldn't we just determine the context
there? I guess there's a branch here already, so it's probably fine as is.

> if (isExtend)
> {
> +
> + pgstat_count_io_op(IOOP_EXTEND, io_context);

Spurious newline.

> @@ -2820,9 +2857,12 @@ BufferGetTag(Buffer buffer, RelFileLocator *rlocator, ForkNumber *forknum,
> *
> * If the caller has an smgr reference for the buffer's relation, pass it
> * as the second parameter. If not, pass NULL.
> + *
> + * IOContext will always be IOCONTEXT_SHARED except when a buffer access strategy is
> + * used and the buffer being flushed is a buffer from the strategy ring.
> */
> static void
> -FlushBuffer(BufferDesc *buf, SMgrRelation reln)
> +FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOContext io_context)

Too long line?

But also, why document the possible values here? Seems likely to get out of
date at some point, and it doesn't seem important to know?

> @@ -3549,6 +3591,8 @@ FlushRelationBuffers(Relation rel)
> localpage,
> false);
>
> + pgstat_count_io_op(IOOP_WRITE, IOCONTEXT_LOCAL);
> +
> buf_state &= ~(BM_DIRTY | BM_JUST_DIRTIED);
> pg_atomic_unlocked_write_u32(&bufHdr->state, buf_state);
>

Probably not worth doing, but these made me wonder whether there should be a
function for counting N operations at once.

> @@ -212,8 +215,23 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state)
> if (strategy != NULL)
> {
> buf = GetBufferFromRing(strategy, buf_state);
> - if (buf != NULL)
> + *from_ring = buf != NULL;
> + if (*from_ring)
> + {

Don't really like the if (*from_ring) - why not keep it as buf != NULL? Seems
a bit confusing this way, making it less obvious what's being changed.

> diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c
> index 014f644bf9..a3d76599bf 100644
> --- a/src/backend/storage/buffer/localbuf.c
> +++ b/src/backend/storage/buffer/localbuf.c
> @@ -15,6 +15,7 @@
> */
> #include "postgres.h"
>
> +#include "pgstat.h"
> #include "access/parallel.h"
> #include "catalog/catalog.h"
> #include "executor/instrument.h"

Do most other places not put pgstat.h in the alphabetical order of headers?

> @@ -432,6 +432,15 @@ ProcessSyncRequests(void)
> total_elapsed += elapsed;
> processed++;
>
> + /*
> + * Note that if a backend using a BufferAccessStrategy is
> + * forced to do its own fsync (as opposed to the
> + * checkpointer doing it), it will not be counted as an
> + * IOCONTEXT_STRATEGY IOOP_FSYNC and instead will be
> + * counted as an IOCONTEXT_SHARED IOOP_FSYNC.
> + */
> + pgstat_count_io_op(IOOP_FSYNC, IOCONTEXT_SHARED);

Why is this noted here? Perhaps just point to the place where that happens
instead? I think it's also documented in ForwardSyncRequest()? Or just only
mention it there...

> @@ -0,0 +1,191 @@
> +/* -------------------------------------------------------------------------
> + *
> + * pgstat_io_ops.c
> + * Implementation of IO operation statistics.
> + *
> + * This file contains the implementation of IO operation statistics. It is kept
> + * separate from pgstat.c to enforce the line between the statistics access /
> + * storage implementation and the details about individual types of
> + * statistics.
> + *
> + * Copyright (c) 2001-2022, PostgreSQL Global Development Group

Arguably this would just be 2021-2022

> +void
> +pgstat_count_io_op(IOOp io_op, IOContext io_context)
> +{
> + PgStat_IOOpCounters *pending_counters = &pending_IOOpStats.data[io_context];
> +
> + Assert(pgstat_expect_io_op(MyBackendType, io_context, io_op));
> +
> + switch (io_op)
> + {
> + case IOOP_ALLOC:
> + pending_counters->allocs++;
> + break;
> + case IOOP_EXTEND:
> + pending_counters->extends++;
> + break;
> + case IOOP_FSYNC:
> + pending_counters->fsyncs++;
> + break;
> + case IOOP_READ:
> + pending_counters->reads++;
> + break;
> + case IOOP_WRITE:
> + pending_counters->writes++;
> + break;
> + }
> +
> +}

How about replacing the breaks with a return and then erroring out if we reach
the end of the function? You did that below, and I think it makes sense.

> +bool
> +pgstat_bktype_io_context_valid(BackendType bktype, IOContext io_context)
> +{

Maybe add a tiny comment about what 'valid' means here? Something like
'return whether the backend type counts io in io_context'.

> + /*
> + * Only regular backends and WAL Sender processes executing queries should
> + * use local buffers.
> + */
> + no_local = bktype == B_AUTOVAC_LAUNCHER || bktype ==
> + B_BG_WRITER || bktype == B_CHECKPOINTER || bktype ==
> + B_AUTOVAC_WORKER || bktype == B_BG_WORKER || bktype ==
> + B_STANDALONE_BACKEND || bktype == B_STARTUP;

I think BG_WORKERS could end up using local buffers, extensions can do just
about everything in them.

> +bool
> +pgstat_bktype_io_op_valid(BackendType bktype, IOOp io_op)
> +{
> + if ((bktype == B_BG_WRITER || bktype == B_CHECKPOINTER) && io_op ==
> + IOOP_READ)
> + return false;

Perhaps we should add an assertion about the backend type making sense here?
I.e. that it's not archiver, walwriter etc?

> +bool
> +pgstat_io_context_io_op_valid(IOContext io_context, IOOp io_op)
> +{
> + /*
> + * Temporary tables using local buffers are not logged and thus do not
> + * require fsync'ing. Set this cell to NULL to differentiate between an
> + * invalid combination and 0 observed IO Operations.

This comment feels a bit out of place?

> +bool
> +pgstat_expect_io_op(BackendType bktype, IOContext io_context, IOOp io_op)
> +{
> + if (!pgstat_io_op_stats_collected(bktype))
> + return false;
> +
> + if (!pgstat_bktype_io_context_valid(bktype, io_context))
> + return false;
> +
> + if (!pgstat_bktype_io_op_valid(bktype, io_op))
> + return false;
> +
> + if (!pgstat_io_context_io_op_valid(io_context, io_op))
> + return false;
> +
> + /*
> + * There are currently no cases of a BackendType, IOContext, IOOp
> + * combination that are specifically invalid.
> + */

"specifically"?

> From 0f141fa7f97a57b8628b1b6fd6029bd3782f16a1 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Date: Mon, 22 Aug 2022 11:35:20 -0400
> Subject: [PATCH v28 4/5] Aggregate IO operation stats per BackendType
>
> Stats on IOOps for all IOContexts for a backend are tracked locally. Add
> functionality for backends to flush these stats to shared memory and
> accumulate them with those from all other backends, exited and live.
> Also add reset and snapshot functions used by cumulative stats system
> for management of these statistics.
>
> The aggregated stats in shared memory could be extended in the future
> with per-backend stats -- useful for per connection IO statistics and
> monitoring.
>
> Some BackendTypes will not flush their pending statistics at regular
> intervals and explicitly call pgstat_flush_io_ops() during the course of
> normal operations to flush their backend-local IO Operation statistics
> to shared memory in a timely manner.

> Because not all BackendType, IOOp, IOContext combinations are valid, the
> validity of the stats are checked before flushing pending stats and
> before reading in the existing stats file to shared memory.

s/are checked/is checked/?

> @@ -1486,6 +1507,42 @@ pgstat_read_statsfile(void)
> if (!read_chunk_s(fpin, &shmem->checkpointer.stats))
> goto error;
>
> + /*
> + * Read IO Operations stats struct
> + */
> + if (!read_chunk_s(fpin, &shmem->io_ops.stat_reset_timestamp))
> + goto error;
> +
> + for (int backend_type = 0; backend_type < BACKEND_NUM_TYPES; backend_type++)
> + {
> + PgStatShared_IOContextOps *backend_io_context_ops = &shmem->io_ops.stats[backend_type];
> + bool expect_backend_stats = true;
> +
> + if (!pgstat_io_op_stats_collected(backend_type))
> + expect_backend_stats = false;
> +
> + for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
> + {
> + if (!expect_backend_stats ||
> + !pgstat_bktype_io_context_valid(backend_type, io_context))
> + {
> + pgstat_io_context_ops_assert_zero(&backend_io_context_ops->data[io_context]);
> + continue;
> + }
> +
> + for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
> + {
> + if (!pgstat_bktype_io_op_valid(backend_type, io_op) ||
> + !pgstat_io_context_io_op_valid(io_context, io_op))
> + pgstat_io_op_assert_zero(&backend_io_context_ops->data[io_context],
> + io_op);
> + }
> + }
> +
> + if (!read_chunk_s(fpin, &backend_io_context_ops->data))
> + goto error;
> + }

Could we put the validation out of line? That's a lot of io stats specific
code to be in pgstat_read_statsfile().

> +/*
> + * Helper function to accumulate PgStat_IOOpCounters. If either of the
> + * passed-in PgStat_IOOpCounters are members of PgStatShared_IOContextOps, the
> + * caller is responsible for ensuring that the appropriate lock is held. This
> + * is not asserted because this function could plausibly be used to accumulate
> + * two local/pending PgStat_IOOpCounters.

What's "this" here?

> + */
> +static void
> +pgstat_accum_io_op(PgStat_IOOpCounters *shared, PgStat_IOOpCounters *local, IOOp io_op)

Given that the comment above says both of them may be local, it's a bit odd to
call it 'shared' here...

> +PgStat_BackendIOContextOps *
> +pgstat_fetch_backend_io_context_ops(void)
> +{
> + pgstat_snapshot_fixed(PGSTAT_KIND_IOOPS);
> +
> + return &pgStatLocal.snapshot.io_ops;
> +}

Not for this patch series, but we really should replace this set of functions
with storing the relevant offset in the kind_info.

> @@ -496,6 +503,8 @@ extern PgStat_CheckpointerStats *pgstat_fetch_stat_checkpointer(void);
> */
>
> extern void pgstat_count_io_op(IOOp io_op, IOContext io_context);
> +extern PgStat_BackendIOContextOps *pgstat_fetch_backend_io_context_ops(void);
> +extern bool pgstat_flush_io_ops(bool nowait);
> extern const char *pgstat_io_context_desc(IOContext io_context);
> extern const char *pgstat_io_op_desc(IOOp io_op);
>

Is there any call to pgstat_flush_io_ops() from outside pgstat*.c? So possibly
it could be in pgstat_internal.h? Not that it's particularly important...

> @@ -506,6 +515,43 @@ extern bool pgstat_bktype_io_op_valid(BackendType bktype, IOOp io_op);
> extern bool pgstat_io_context_io_op_valid(IOContext io_context, IOOp io_op);
> extern bool pgstat_expect_io_op(BackendType bktype, IOContext io_context, IOOp io_op);
>
> +/*
> + * Functions to assert that invalid IO Operation counters are zero. Used with
> + * the validation functions in pgstat_io_ops.c
> + */
> +static inline void
> +pgstat_io_context_ops_assert_zero(PgStat_IOOpCounters *counters)
> +{
> + Assert(counters->allocs == 0 && counters->extends == 0 &&
> + counters->fsyncs == 0 && counters->reads == 0 &&
> + counters->writes == 0);
> +}
> +
> +static inline void
> +pgstat_io_op_assert_zero(PgStat_IOOpCounters *counters, IOOp io_op)
> +{
> + switch (io_op)
> + {
> + case IOOP_ALLOC:
> + Assert(counters->allocs == 0);
> + return;
> + case IOOP_EXTEND:
> + Assert(counters->extends == 0);
> + return;
> + case IOOP_FSYNC:
> + Assert(counters->fsyncs == 0);
> + return;
> + case IOOP_READ:
> + Assert(counters->reads == 0);
> + return;
> + case IOOP_WRITE:
> + Assert(counters->writes == 0);
> + return;
> + }
> +
> + elog(ERROR, "unrecognized IOOp value: %d", io_op);

Hm. This means it'll emit code even in non-assertion builds - this should
probably just be an Assert(false) or pg_unreachable().

> Subject: [PATCH v28 5/5] Add system view tracking IO ops per backend type

> View stats are fetched from statistics incremented when a backend
> performs an IO Operation and maintained by the cumulative statistics
> subsystem.

"fetched from statistics incremented"?

> Each row of the view is stats for a particular BackendType for a
> particular IOContext (e.g. shared buffer accesses by checkpointer) and
> each column in the view is the total number of IO Operations done (e.g.
> writes).

s/is/shows/?

s/for a particular BackendType for a particular IOContext/for a particularl
BackendType and IOContext/? Somehow the repetition is weird.

> Note that some of the cells in the view are redundant with fields in
> pg_stat_bgwriter (e.g. buffers_backend), however these have been kept in
> pg_stat_bgwriter for backwards compatibility. Deriving the redundant
> pg_stat_bgwriter stats from the IO operations stats structures was also
> problematic due to the separate reset targets for 'bgwriter' and
> 'io'.

I suspect we should still consider doing that in the future, perhaps by
documenting that the relevant fields in pg_stat_bgwriter aren't reset by the
'bgwriter' target anymore? And noting that reliance on those fields is
"deprecated" and that pg_stat_io should be used instead?

> Suggested by Andres Freund
>
> Author: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Reviewed-by: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
> Discussion: https://www.postgresql.org/message-id/flat/20200124195226.lth52iydq2n2uilq%40alap3.anarazel.de
> ---
> doc/src/sgml/monitoring.sgml | 115 ++++++++++++++-
> src/backend/catalog/system_views.sql | 12 ++
> src/backend/utils/adt/pgstatfuncs.c | 100 +++++++++++++
> src/include/catalog/pg_proc.dat | 9 ++
> src/test/regress/expected/rules.out | 9 ++
> src/test/regress/expected/stats.out | 201 +++++++++++++++++++++++++++
> src/test/regress/sql/stats.sql | 103 ++++++++++++++
> 7 files changed, 548 insertions(+), 1 deletion(-)
>
> diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
> index 9440b41770..9949011ba3 100644
> --- a/doc/src/sgml/monitoring.sgml
> +++ b/doc/src/sgml/monitoring.sgml
> @@ -448,6 +448,15 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
> </entry>
> </row>
>
> + <row>
> + <entry><structname>pg_stat_io</structname><indexterm><primary>pg_stat_io</primary></indexterm></entry>
> + <entry>A row for each IO Context for each backend type showing
> + statistics about backend IO operations. See
> + <link linkend="monitoring-pg-stat-io-view">
> + <structname>pg_stat_io</structname></link> for details.
> + </entry>
> + </row>

The "for each for each" thing again :)

> + <row>
> + <entry role="catalog_table_entry"><para role="column_definition">
> + <structfield>io_context</structfield> <type>text</type>
> + </para>
> + <para>
> + IO Context used (e.g. shared buffers, direct).
> + </para></entry>
> + </row>

Wrong list of contexts.

> + <row>
> + <entry role="catalog_table_entry"><para role="column_definition">
> + <structfield>alloc</structfield> <type>bigint</type>
> + </para>
> + <para>
> + Number of buffers allocated.
> + </para></entry>
> + </row>
> +
> + <row>
> + <entry role="catalog_table_entry"><para role="column_definition">
> + <structfield>extend</structfield> <type>bigint</type>
> + </para>
> + <para>
> + Number of blocks extended.
> + </para></entry>
> + </row>
> +
> + <row>
> + <entry role="catalog_table_entry"><para role="column_definition">
> + <structfield>fsync</structfield> <type>bigint</type>
> + </para>
> + <para>
> + Number of blocks fsynced.
> + </para></entry>
> + </row>
> +
> + <row>
> + <entry role="catalog_table_entry"><para role="column_definition">
> + <structfield>read</structfield> <type>bigint</type>
> + </para>
> + <para>
> + Number of blocks read.
> + </para></entry>
> + </row>
> +
> + <row>
> + <entry role="catalog_table_entry"><para role="column_definition">
> + <structfield>write</structfield> <type>bigint</type>
> + </para>
> + <para>
> + Number of blocks written.
> + </para></entry>
> + </row>

> + <row>
> + <entry role="catalog_table_entry"><para role="column_definition">
> + <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
> + </para>
> + <para>
> + Time at which these statistics were last reset.
> </para></entry>
> </row>
> </tbody>

Part of me thinks it'd be nicer if it were "allocated, read, written, extended,
fsynced, stats_reset", instead of alphabetical order. The order already isn't
alphabetical.

> + /*
> + * When adding a new column to the pg_stat_io view, add a new enum value
> + * here above IO_NUM_COLUMNS.
> + */
> + enum
> + {
> + IO_COLUMN_BACKEND_TYPE,
> + IO_COLUMN_IO_CONTEXT,
> + IO_COLUMN_ALLOCS,
> + IO_COLUMN_EXTENDS,
> + IO_COLUMN_FSYNCS,
> + IO_COLUMN_READS,
> + IO_COLUMN_WRITES,
> + IO_COLUMN_RESET_TIME,
> + IO_NUM_COLUMNS,
> + };

Given it's local and some of the lines are long, maybe just use COL?

> +#define IO_COLUMN_IOOP_OFFSET (IO_COLUMN_IO_CONTEXT + 1)

Undef'ing it probably worth doing.

> + SetSingleFuncCall(fcinfo, 0);
> + rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
> +
> + backends_io_stats = pgstat_fetch_backend_io_context_ops();
> +
> + reset_time = TimestampTzGetDatum(backends_io_stats->stat_reset_timestamp);
> +
> + for (int bktype = 0; bktype < BACKEND_NUM_TYPES; bktype++)
> + {
> + Datum bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
> + bool expect_backend_stats = true;
> + PgStat_IOContextOps *io_context_ops = &backends_io_stats->stats[bktype];
> +
> + /*
> + * For those BackendTypes without IO Operation stats, skip
> + * representing them in the view altogether.
> + */
> + if (!pgstat_io_op_stats_collected(bktype))
> + expect_backend_stats = false;

Why not just expect_backend_stats = pgstat_io_op_stats_collected()?

> + for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
> + {
> + PgStat_IOOpCounters *counters = &io_context_ops->data[io_context];
> + Datum values[IO_NUM_COLUMNS];
> + bool nulls[IO_NUM_COLUMNS];
> +
> + /*
> + * Some combinations of IOCONTEXT and BackendType are not valid
> + * for any type of IO Operation. In such cases, omit the entire
> + * row from the view.
> + */
> + if (!expect_backend_stats ||
> + !pgstat_bktype_io_context_valid(bktype, io_context))
> + {
> + pgstat_io_context_ops_assert_zero(counters);
> + continue;
> + }
> +
> + memset(values, 0, sizeof(values));
> + memset(nulls, 0, sizeof(nulls));

I'd replace the memset with values[...] = {0} etc.

> + values[IO_COLUMN_BACKEND_TYPE] = bktype_desc;
> + values[IO_COLUMN_IO_CONTEXT] = CStringGetTextDatum(
> + pgstat_io_context_desc(io_context));

Pgindent, I hate you.

Perhaps put it the context desc in a local var, so it doesn't look quite this
ugly?

> + values[IO_COLUMN_ALLOCS] = Int64GetDatum(counters->allocs);
> + values[IO_COLUMN_EXTENDS] = Int64GetDatum(counters->extends);
> + values[IO_COLUMN_FSYNCS] = Int64GetDatum(counters->fsyncs);
> + values[IO_COLUMN_READS] = Int64GetDatum(counters->reads);
> + values[IO_COLUMN_WRITES] = Int64GetDatum(counters->writes);
> + values[IO_COLUMN_RESET_TIME] = TimestampTzGetDatum(reset_time);
> +
> +
> + /*
> + * Some combinations of BackendType and IOOp and of IOContext and
> + * IOOp are not valid. Set these cells in the view NULL and assert
> + * that these stats are zero as expected.
> + */
> + for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
> + {
> + if (!pgstat_bktype_io_op_valid(bktype, io_op) ||
> + !pgstat_io_context_io_op_valid(io_context, io_op))
> + {
> + pgstat_io_op_assert_zero(counters, io_op);
> + nulls[io_op + IO_COLUMN_IOOP_OFFSET] = true;
> + }
> + }

A bit weird that we first assign a value and then set nulls separately. But
it's not obvious how to make it look nice otherwise.

> +-- Test that allocs, extends, reads, and writes to Shared Buffers and fsyncs
> +-- done to ensure durability of Shared Buffers are tracked in pg_stat_io.
> +SELECT sum(alloc) AS io_sum_shared_allocs_before FROM pg_stat_io WHERE io_context = 'Shared' \gset
> +SELECT sum(extend) AS io_sum_shared_extends_before FROM pg_stat_io WHERE io_context = 'Shared' \gset
> +SELECT sum(fsync) AS io_sum_shared_fsyncs_before FROM pg_stat_io WHERE io_context = 'Shared' \gset
> +SELECT sum(read) AS io_sum_shared_reads_before FROM pg_stat_io WHERE io_context = 'Shared' \gset
> +SELECT sum(write) AS io_sum_shared_writes_before FROM pg_stat_io WHERE io_context = 'Shared' \gset
> +-- Create a regular table and insert some data to generate IOCONTEXT_SHARED allocs and extends.
> +CREATE TABLE test_io_shared(a int);
> +INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
> +SELECT pg_stat_force_next_flush();
> + pg_stat_force_next_flush
> +--------------------------
> +
> +(1 row)
> +
> +-- After a checkpoint, there should be some additional IOCONTEXT_SHARED writes and fsyncs.
> +CHECKPOINT;

Does that work reliably? A checkpoint could have started just before the
CREATE TABLE, I think? Then it'd not have flushed those writes yet. I think
doing two checkpoints would protect against that.

> +DROP TABLE test_io_shared;
> +DROP TABLESPACE test_io_shared_stats_tblspc;

Tablespace creation is somewhat expensive, do we really need that? There
should be one set up in setup.sql or such.

> +-- Test that allocs, extends, reads, and writes of temporary tables are tracked
> +-- in pg_stat_io.
> +CREATE TEMPORARY TABLE test_io_local(a int, b TEXT);
> +SELECT sum(alloc) AS io_sum_local_allocs_before FROM pg_stat_io WHERE io_context = 'Local' \gset
> +SELECT sum(extend) AS io_sum_local_extends_before FROM pg_stat_io WHERE io_context = 'Local' \gset
> +SELECT sum(read) AS io_sum_local_reads_before FROM pg_stat_io WHERE io_context = 'Local' \gset
> +SELECT sum(write) AS io_sum_local_writes_before FROM pg_stat_io WHERE io_context = 'Local' \gset
> +-- Insert enough values that we need to reuse and write out dirty local
> +-- buffers.
> +INSERT INTO test_io_local SELECT generate_series(1, 80000) as id,
> +'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';

Could be abbreviated with repeat('a', some-number) :P

Can the table be smaller than this? That might show up on a slow machine.

> +SELECT sum(alloc) AS io_sum_local_allocs_after FROM pg_stat_io WHERE io_context = 'Local' \gset
> +SELECT sum(extend) AS io_sum_local_extends_after FROM pg_stat_io WHERE io_context = 'Local' \gset
> +SELECT sum(read) AS io_sum_local_reads_after FROM pg_stat_io WHERE io_context = 'Local' \gset
> +SELECT sum(write) AS io_sum_local_writes_after FROM pg_stat_io WHERE io_context = 'Local' \gset
> +SELECT :io_sum_local_allocs_after > :io_sum_local_allocs_before;

Random q: Why are we uppercasing the first letter of the context?

> +CREATE TABLE test_io_strategy(a INT, b INT);
> +ALTER TABLE test_io_strategy SET (autovacuum_enabled = 'false');

I think you can specify that as part of the CREATE TABLE. Not sure if
otherwise there's not a race where autovac coul start before you do the ALTER.

> +INSERT INTO test_io_strategy SELECT i, i from generate_series(1, 8000)i;
> +-- Ensure that the next VACUUM will need to perform IO by rewriting the table
> +-- first with VACUUM (FULL).

... because VACUUM FULL currently doesn't set all-visible etc on the pages,
which the subsequent vacuum will then do.

> +-- Hope that the previous value of wal_skip_threshold was the default. We
> +-- can't use BEGIN...SET LOCAL since VACUUM can't be run inside a transaction
> +-- block.
> +RESET wal_skip_threshold;

Nothing in this file set it before, so that's a pretty sure-to-be-fulfilled
hope.

> +-- Test that, when using a Strategy, if creating a relation, Strategy extends

s/if/when/?

Looks good!

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ibrar Ahmed 2022-08-25 19:22:37 Re: postgres_fdw hint messages
Previous Message Joe Conway 2022-08-25 19:03:27 Re: has_privs_of_role vs. is_member_of_role, redux