Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Lukas Fittl <lukas(at)fittl(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Date: 2022-07-06 19:20:47
Message-ID: 20220706192047.sikr2vuwvpslqdrw@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-07-05 13:24:55 -0400, Melanie Plageman wrote:
> From 2d089e26236c55d1be5b93833baa0cf7667ba38d Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Date: Tue, 28 Jun 2022 11:33:04 -0400
> Subject: [PATCH v22 1/3] Add BackendType for standalone backends
>
> All backends should have a BackendType to enable statistics reporting
> per BackendType.
>
> Add a new BackendType for standalone backends, B_STANDALONE_BACKEND (and
> alphabetize the BackendTypes). Both the bootstrap backend and single
> user mode backends will have BackendType B_STANDALONE_BACKEND.
>
> Author: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Discussion: https://www.postgresql.org/message-id/CAAKRu_aaq33UnG4TXq3S-OSXGWj1QGf0sU%2BECH4tNwGFNERkZA%40mail.gmail.com
> ---
> src/backend/utils/init/miscinit.c | 17 +++++++++++------
> src/include/miscadmin.h | 5 +++--
> 2 files changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
> index eb43b2c5e5..07e6db1a1c 100644
> --- a/src/backend/utils/init/miscinit.c
> +++ b/src/backend/utils/init/miscinit.c
> @@ -176,6 +176,8 @@ InitStandaloneProcess(const char *argv0)
> {
> Assert(!IsPostmasterEnvironment);
>
> + MyBackendType = B_STANDALONE_BACKEND;

Hm. This is used for singleuser mode as well as bootstrap. Should we
split those? It's not like bootstrap mode really matters for stats, so
I'm inclined not to.

> @@ -375,6 +376,8 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
> * out the initial relation mapping files.
> */
> RelationMapFinishBootstrap();
> + // TODO: should this be done for bootstrap?
> + pgstat_report_io_ops();

Hm. Not particularly useful, but also not harmful. But we don't need an
explicit call, because it'll be done at process exit too. At least I
think, it could be that it's different for bootstrap.

> diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
> index 2e146aac93..e6dbb1c4bb 100644
> --- a/src/backend/postmaster/autovacuum.c
> +++ b/src/backend/postmaster/autovacuum.c
> @@ -1712,6 +1712,9 @@ AutoVacWorkerMain(int argc, char *argv[])
> recentXid = ReadNextTransactionId();
> recentMulti = ReadNextMultiXactId();
> do_autovacuum();
> +
> + // TODO: should this be done more often somewhere in do_autovacuum()?
> + pgstat_report_io_ops();
> }

Don't think you need all these calls before process exit - it'll happen
via pgstat_shutdown_hook().

IMO it'd be a good idea to add pgstat_report_io_ops() to
pgstat_report_vacuum()/analyze(), so that the stats for a longrunning
autovac worker get updated more regularly.

> diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
> index 91e6f6ea18..87e4b9e9bd 100644
> --- a/src/backend/postmaster/bgwriter.c
> +++ b/src/backend/postmaster/bgwriter.c
> @@ -242,6 +242,7 @@ BackgroundWriterMain(void)
>
> /* Report pending statistics to the cumulative stats system */
> pgstat_report_bgwriter();
> + pgstat_report_io_ops();
>
> if (FirstCallSinceLastCheckpoint())
> {

How about moving the pgstat_report_io_ops() into
pgstat_report_bgwriter(), pgstat_report_autovacuum() etc? Seems
unnecessary to have multiple pgstat_* calls in these places.

> +/*
> + * Flush out locally pending IO Operation statistics entries
> + *
> + * If nowait is true, this function returns false on lock failure. Otherwise
> + * this function always returns true. Writer processes are mutually excluded
> + * using LWLock, but readers are expected to use change-count protocol to avoid
> + * interference with writers.
> + *
> + * If nowait is true, this function returns true if the lock could not be
> + * acquired. Otherwise return false.
> + *
> + */
> +bool
> +pgstat_flush_io_ops(bool nowait)
> +{
> + PgStat_IOPathOps *dest_io_path_ops;
> + PgStatShared_BackendIOPathOps *stats_shmem;
> +
> + PgBackendStatus *beentry = MyBEEntry;
> +
> + if (!have_ioopstats)
> + return false;
> +
> + if (!beentry || beentry->st_backendType == B_INVALID)
> + return false;
> +
> + stats_shmem = &pgStatLocal.shmem->io_ops;
> +
> + if (!nowait)
> + LWLockAcquire(&stats_shmem->lock, LW_EXCLUSIVE);
> + else if (!LWLockConditionalAcquire(&stats_shmem->lock, LW_EXCLUSIVE))
> + return true;

Wonder if it's worth making the lock specific to the backend type?

> + dest_io_path_ops =
> + &stats_shmem->stats[backend_type_get_idx(beentry->st_backendType)];
> +

This could be done before acquiring the lock, right?

> +void
> +pgstat_io_ops_snapshot_cb(void)
> +{
> + PgStatShared_BackendIOPathOps *stats_shmem = &pgStatLocal.shmem->io_ops;
> + PgStat_IOPathOps *snapshot_ops = pgStatLocal.snapshot.io_path_ops;
> + PgStat_IOPathOps *reset_ops;
> +
> + PgStat_IOPathOps *reset_offset = stats_shmem->reset_offset;
> + PgStat_IOPathOps reset[BACKEND_NUM_TYPES];
> +
> + pgstat_copy_changecounted_stats(snapshot_ops,
> + &stats_shmem->stats, sizeof(stats_shmem->stats),
> + &stats_shmem->changecount);

This doesn't make sense - with multiple writers you can't use the
changecount approach (and you don't in the flush part above).

> + LWLockAcquire(&stats_shmem->lock, LW_SHARED);
> + memcpy(&reset, reset_offset, sizeof(stats_shmem->stats));
> + LWLockRelease(&stats_shmem->lock);

Which then also means that you don't need the reset offset stuff. It's
only there because with the changecount approach we can't take a lock to
reset the stats (since there is no lock). With a lock you can just reset
the shared state.

> +void
> +pgstat_count_io_op(IOOp io_op, IOPath io_path)
> +{
> + PgStat_IOOpCounters *pending_counters = &pending_IOOpStats.data[io_path];
> + PgStat_IOOpCounters *cumulative_counters =
> + &cumulative_IOOpStats.data[io_path];

the pending_/cumultive_ prefix before an uppercase-first camelcase name
seems ugly...

> + switch (io_op)
> + {
> + case IOOP_ALLOC:
> + pending_counters->allocs++;
> + cumulative_counters->allocs++;
> + break;
> + case IOOP_EXTEND:
> + pending_counters->extends++;
> + cumulative_counters->extends++;
> + break;
> + case IOOP_FSYNC:
> + pending_counters->fsyncs++;
> + cumulative_counters->fsyncs++;
> + break;
> + case IOOP_WRITE:
> + pending_counters->writes++;
> + cumulative_counters->writes++;
> + break;
> + }
> +
> + have_ioopstats = true;
> +}

Doing two math ops / memory accesses every time seems off. Seems better
to maintain cumultive_counters whenever reporting stats, just before
zeroing pending_counters?

> +/*
> + * Report IO operation statistics
> + *
> + * This works in much the same way as pgstat_flush_io_ops() but is meant for
> + * BackendTypes like bgwriter for whom pgstat_report_stat() will not be called
> + * frequently enough to keep shared memory stats fresh.
> + * Backends not typically calling pgstat_report_stat() can invoke
> + * pgstat_report_io_ops() explicitly.
> + */
> +void
> +pgstat_report_io_ops(void)
> +{

This shouldn't be needed - the flush function above can be used.

> + PgStat_IOPathOps *dest_io_path_ops;
> + PgStatShared_BackendIOPathOps *stats_shmem;
> +
> + PgBackendStatus *beentry = MyBEEntry;
> +
> + Assert(!pgStatLocal.shmem->is_shutdown);
> + pgstat_assert_is_up();
> +
> + if (!have_ioopstats)
> + return;
> +
> + if (!beentry || beentry->st_backendType == B_INVALID)
> + return;

Is there a case where this may be called where we have no beentry?

Why not just use MyBackendType?

> + stats_shmem = &pgStatLocal.shmem->io_ops;
> +
> + dest_io_path_ops =
> + &stats_shmem->stats[backend_type_get_idx(beentry->st_backendType)];
> +
> + pgstat_begin_changecount_write(&stats_shmem->changecount);

A mentioned before, the changecount stuff doesn't apply here. You need a
lock.

> +PgStat_IOPathOps *
> +pgstat_fetch_backend_io_path_ops(void)
> +{
> + pgstat_snapshot_fixed(PGSTAT_KIND_IOOPS);
> + return pgStatLocal.snapshot.io_path_ops;
> +}
> +
> +PgStat_Counter
> +pgstat_fetch_cumulative_io_ops(IOPath io_path, IOOp io_op)
> +{
> + PgStat_IOOpCounters *counters = &cumulative_IOOpStats.data[io_path];
> +
> + switch (io_op)
> + {
> + case IOOP_ALLOC:
> + return counters->allocs;
> + case IOOP_EXTEND:
> + return counters->extends;
> + case IOOP_FSYNC:
> + return counters->fsyncs;
> + case IOOP_WRITE:
> + return counters->writes;
> + default:
> + elog(ERROR, "IO Operation %s for IO Path %s is undefined.",
> + pgstat_io_op_desc(io_op), pgstat_io_path_desc(io_path));
> + }
> +}

There's currently no user for this, right? Maybe let's just defer the
cumulative stuff until we need it?

> +const char *
> +pgstat_io_path_desc(IOPath io_path)
> +{
> + const char *io_path_desc = "Unknown IO Path";
> +

This should be unreachable, right?

> From f2b5b75f5063702cbc3c64efdc1e7ef3cf1acdb4 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Date: Mon, 4 Jul 2022 15:44:17 -0400
> Subject: [PATCH v22 3/3] Add system view tracking IO ops per backend type

> Add pg_stat_buffers, a system view which tracks the number of IO
> operations (allocs, writes, fsyncs, and extends) done through each IO
> path (e.g. shared buffers, local buffers, unbuffered IO) by each type of
> backend.

I think I like pg_stat_io a bit better? Nearly everything in here seems
to fit better in that.

I guess we could split out buffers allocated, but that's actually
interesting in the context of the kind of IO too.

> <row>
> <entry><structname>pg_stat_wal</structname><indexterm><primary>pg_stat_wal</primary></indexterm></entry>
> <entry>One row only, showing statistics about WAL activity. See
> @@ -3595,7 +3604,102 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
> <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
> </para>
> <para>
> - Time at which these statistics were last reset
> + Time at which these statistics were last reset.
> + </para></entry>

Grammar critique time :)

> +CREATE VIEW pg_stat_buffers AS
> +SELECT
> + b.backend_type,
> + b.io_path,
> + b.alloc,
> + b.extend,
> + b.fsync,
> + b.write,
> + b.stats_reset
> +FROM pg_stat_get_buffers() b;

Do we want to expose all data to all users? I guess pg_stat_bgwriter
does? But this does split things out a lot more...

> + for (int i = 0; i < BACKEND_NUM_TYPES; i++)
> + {
> + PgStat_IOOpCounters *counters = io_path_ops->data;
> + Datum backend_type_desc =
> + CStringGetTextDatum(GetBackendTypeDesc(idx_get_backend_type(i)));
> + /* const char *log_name = GetBackendTypeDesc(idx_get_backend_type(i)); */
> +
> + for (int j = 0; j < IOPATH_NUM_TYPES; j++)
> + {
> + Datum values[BUFFERS_NUM_COLUMNS];
> + bool nulls[BUFFERS_NUM_COLUMNS];
> + memset(values, 0, sizeof(values));
> + memset(nulls, 0, sizeof(nulls));
> +
> + values[BUFFERS_COLUMN_BACKEND_TYPE] = backend_type_desc;
> + values[BUFFERS_COLUMN_IO_PATH] = CStringGetTextDatum(pgstat_io_path_desc(j));

Random musing: I wonder if we should start to use SQL level enums for
this kind of thing.

> DROP TABLE trunc_stats_test, trunc_stats_test1, trunc_stats_test2, trunc_stats_test3, trunc_stats_test4;
> DROP TABLE prevstats;
> +SELECT pg_stat_reset_shared('buffers');
> + pg_stat_reset_shared
> +----------------------
> +
> +(1 row)
> +
> +SELECT pg_stat_force_next_flush();
> + pg_stat_force_next_flush
> +--------------------------
> +
> +(1 row)
> +
> +SELECT write = 0 FROM pg_stat_buffers WHERE io_path = 'Shared' and backend_type = 'checkpointer';
> + ?column?
> +----------
> + t
> +(1 row)

Don't think you can rely on that. The lookup of the view, functions
might have needed to load catalog data, which might have needed to evict
buffers. I think you can do something more reliable by checking that
there's more written buffers after a checkpoint than before, or such.

Would be nice to have something testing that the ringbuffer stats stuff
does something sensible - that feels not entirely trivial.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2022-07-06 19:29:41 Re: EINTR in ftruncate()
Previous Message Alexander Korotkov 2022-07-06 18:53:41 Re: Add red-black tree missing comparison searches