Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Lukas Fittl <lukas(at)fittl(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Date: 2023-01-13 23:36:00
Message-ID: 20230113233600.wbdzaixihlim34di@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-01-13 13:38:15 -0500, Melanie Plageman wrote:
> > I think that's marginally better, but I think having to define both
> > FIRST and NUM is excessive and doesn't make it less fragile. Not sure
> > what anyone else will say, but I'd prefer if it started at "0".

The reason for using FIRST is to be able to define the loop variable as the
enum type, without assigning numeric values to an enum var. I prefer it
slightly.

> From f8c9077631169a778c893fd16b7a973ad5725f2a Mon Sep 17 00:00:00 2001
> From: Andres Freund <andres(at)anarazel(dot)de>
> Date: Fri, 9 Dec 2022 18:23:19 -0800
> Subject: [PATCH v47 1/5] pgindent and some manual cleanup in pgstat related

Applied.

> Subject: [PATCH v47 2/5] pgstat: Infrastructure to track IO operations

> diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
> index 0fa5370bcd..608c3b59da 100644
> --- a/src/backend/utils/activity/pgstat.c
> +++ b/src/backend/utils/activity/pgstat.c

Reminder to self: Need to bump PGSTAT_FILE_FORMAT_ID before commit.

Perhaps you could add a note about that to the commit message?

> @@ -359,6 +360,15 @@ static const PgStat_KindInfo pgstat_kind_infos[PGSTAT_NUM_KINDS] = {
> .snapshot_cb = pgstat_checkpointer_snapshot_cb,
> },
>
> + [PGSTAT_KIND_IO] = {
> + .name = "io_ops",

That should be "io" now I think?

> +/*
> + * Check that stats have not been counted for any combination of IOContext,
> + * IOObject, and IOOp which are not tracked for the passed-in BackendType. The
> + * passed-in PgStat_BackendIO must contain stats from the BackendType specified
> + * by the second parameter. Caller is responsible for locking the passed-in
> + * PgStat_BackendIO, if needed.
> + */

Other PgStat_Backend* structs are just for pending data. Perhaps we could
rename it slightly to make that clearer? PgStat_BktypeIO?
PgStat_IOForBackendType? or a similar variation?

> +bool
> +pgstat_bktype_io_stats_valid(PgStat_BackendIO *backend_io,
> + BackendType bktype)
> +{
> + bool bktype_tracked = pgstat_tracks_io_bktype(bktype);
> +
> + for (IOContext io_context = IOCONTEXT_FIRST;
> + io_context < IOCONTEXT_NUM_TYPES; io_context++)
> + {
> + for (IOObject io_object = IOOBJECT_FIRST;
> + io_object < IOOBJECT_NUM_TYPES; io_object++)
> + {
> + /*
> + * Don't bother trying to skip to the next loop iteration if
> + * pgstat_tracks_io_object() would return false here. We still
> + * need to validate that each counter is zero anyway.
> + */
> + for (IOOp io_op = IOOP_FIRST; io_op < IOOP_NUM_TYPES; io_op++)
> + {
> + if ((!bktype_tracked || !pgstat_tracks_io_op(bktype, io_context, io_object, io_op)) &&
> + backend_io->data[io_context][io_object][io_op] != 0)
> + return false;

Hm, perhaps this could be broken up into multiple lines? Something like

/* no stats, so nothing to validate */
if (backend_io->data[io_context][io_object][io_op] == 0)
continue;

/* something went wrong if have stats for something not tracked */
if (!bktype_tracked ||
!pgstat_tracks_io_op(bktype, io_context, io_object, io_op))
return false;

> +typedef struct PgStat_BackendIO
> +{
> + PgStat_Counter data[IOCONTEXT_NUM_TYPES][IOOBJECT_NUM_TYPES][IOOP_NUM_TYPES];
> +} PgStat_BackendIO;

Would it bother you if we swapped the order of iocontext and iobject here and
related places? It makes more sense to me semantically, and should now be
pretty easy, code wise.

> +/* shared version of PgStat_IO */
> +typedef struct PgStatShared_IO
> +{

Maybe /* PgStat_IO in shared memory */?

> Subject: [PATCH v47 3/5] pgstat: Count IO for relations

Nearly happy with this now. See one minor nit below.

I don't love the counting in register_dirty_segment() and mdsyncfiletag(), but
I don't have a better idea, and it doesn't seem too horrible.

> @@ -1441,6 +1474,28 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
>
> UnlockBufHdr(buf, buf_state);
>
> + if (oldFlags & BM_VALID)
> + {
> + /*
> + * When a BufferAccessStrategy is in use, blocks evicted from shared
> + * buffers are counted as IOOP_EVICT in the corresponding context
> + * (e.g. IOCONTEXT_BULKWRITE). Shared buffers are evicted by a
> + * strategy in two cases: 1) while initially claiming buffers for the
> + * strategy ring 2) to replace an existing strategy ring buffer
> + * because it is pinned or in use and cannot be reused.
> + *
> + * Blocks evicted from buffers already in the strategy ring are
> + * counted as IOOP_REUSE in the corresponding strategy context.
> + *
> + * At this point, we can accurately count evictions and reuses,
> + * because we have successfully claimed the valid buffer. Previously,
> + * we may have been forced to release the buffer due to concurrent
> + * pinners or erroring out.
> + */
> + pgstat_count_io_op(from_ring ? IOOP_REUSE : IOOP_EVICT,
> + IOOBJECT_RELATION, *io_context);
> + }
> +
> if (oldPartitionLock != NULL)
> {
> BufTableDelete(&oldTag, oldHash);

There's no reason to do this while we still hold the buffer partition lock,
right? That's a highly contended lock, and we can just move the counting a few
lines down.

> @@ -1410,6 +1432,9 @@ mdsyncfiletag(const FileTag *ftag, char *path)
> if (need_to_close)
> FileClose(file);
>
> + if (result >= 0)
> + pgstat_count_io_op(IOOP_FSYNC, IOOBJECT_RELATION, IOCONTEXT_NORMAL);
> +

I'd lean towards doing this unconditionally, it's still an fsync if it
failed... Not that it matters.

> Subject: [PATCH v47 4/5] Add system view tracking IO ops per backend type

Note to self + commit message: Remember the need to do a catversion bump.

> +-- pg_stat_io test:
> +-- verify_heapam always uses a BAS_BULKREAD BufferAccessStrategy.

Maybe add that "whereas a sequential scan does not, see ..."?

> This allows
> +-- us to reliably test that pg_stat_io BULKREAD reads are being captured
> +-- without relying on the size of shared buffers or on an expensive operation
> +-- like CREATE DATABASE.

CREATE / DROP TABLESPACE is also pretty expensive, but I don't have a better
idea.

> +-- Create an alternative tablespace and move the heaptest table to it, causing
> +-- it to be rewritten.

IIRC the point of that is that it reliably evicts all the buffers from s_b,
correct? If so, mention that?

> +Datum
> +pg_stat_get_io(PG_FUNCTION_ARGS)
> +{
> + ReturnSetInfo *rsinfo;
> + PgStat_IO *backends_io_stats;
> + Datum reset_time;
> +
> + InitMaterializedSRF(fcinfo, 0);
> + rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
> +
> + backends_io_stats = pgstat_fetch_stat_io();
> +
> + reset_time = TimestampTzGetDatum(backends_io_stats->stat_reset_timestamp);
> +
> + for (BackendType bktype = B_INVALID; bktype < BACKEND_NUM_TYPES; bktype++)
> + {
> + bool bktype_tracked;
> + Datum bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
> + PgStat_BackendIO *bktype_stats = &backends_io_stats->stats[bktype];
> +
> + /*
> + * For those BackendTypes without IO Operation stats, skip
> + * representing them in the view altogether. We still loop through
> + * their counters so that we can assert that all values are zero.
> + */
> + bktype_tracked = pgstat_tracks_io_bktype(bktype);

How about instead just doing Assert(pgstat_bktype_io_stats_valid(...))? That
deduplicates the logic for the asserts, and avoids doing the full loop when
assertions aren't enabled anyway?

Otherwise, see also the suggestion aout formatting the assertions as I
suggested for 0002.

> +-- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
> +-- and fsyncs.
> +-- The second checkpoint ensures that stats from the first checkpoint have been
> +-- reported and protects against any potential races amongst the table
> +-- creation, a possible timing-triggered checkpoint, and the explicit
> +-- checkpoint in the test.

There's a comment about the subsequent checkpoints earlier in the file, and I
think the comment is slightly more precise. Mybe just reference the earlier comment?

> +-- Change the tablespace so that the table is rewritten directly, then SELECT
> +-- from it to cause it to be read back into shared buffers.
> +SET allow_in_place_tablespaces = true;
> +CREATE TABLESPACE regress_io_stats_tblspc LOCATION '';

Perhaps worth doing this in tablespace.sql, to avoid the additional
checkpoints done as part of CREATE/DROP TABLESPACE?

Or, at least combine this with the CHECKPOINTs above?

> +-- Drop the table so we can drop the tablespace later.
> +DROP TABLE test_io_shared;
> +-- Test that the follow IOCONTEXT_LOCAL IOOps are tracked in pg_stat_io:
> +-- - eviction of local buffers in order to reuse them
> +-- - reads of temporary table blocks into local buffers
> +-- - writes of local buffers to permanent storage
> +-- - extends of temporary tables
> +-- Set temp_buffers to a low value so that we can trigger writes with fewer
> +-- inserted tuples. Do so in a new session in case temporary tables have been
> +-- accessed by previous tests in this session.
> +\c
> +SET temp_buffers TO '1MB';

I'd set it to the actual minimum '100' (in pages). Perhaps that'd allow to
make test_io_local a bit smaller?

> +CREATE TEMPORARY TABLE test_io_local(a int, b TEXT);
> +SELECT sum(extends) AS io_sum_local_extends_before
> + FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation' \gset
> +SELECT sum(evictions) AS io_sum_local_evictions_before
> + FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation' \gset
> +SELECT sum(writes) AS io_sum_local_writes_before
> + FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation' \gset
> +-- Insert tuples into the temporary table, generating extends in the stats.
> +-- Insert enough values that we need to reuse and write out dirty local
> +-- buffers, generating evictions and writes.
> +INSERT INTO test_io_local SELECT generate_series(1, 8000) as id, repeat('a', 100);
> +SELECT sum(reads) AS io_sum_local_reads_before
> + FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation' \gset

Maybe add something like

SELECT pg_relation_size('test_io_local') / current_setting('block_size')::int8 > 100;

Better toast compression or such could easily make test_io_local smaller than
it's today. Seeing that it's too small would make it easier to understand the
failure.

> +SELECT sum(evictions) AS io_sum_local_evictions_after
> + FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation' \gset
> +SELECT sum(reads) AS io_sum_local_reads_after
> + FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation' \gset
> +SELECT sum(writes) AS io_sum_local_writes_after
> + FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation' \gset
> +SELECT sum(extends) AS io_sum_local_extends_after
> + FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation' \gset

This could just be one select with multiple columns?

I think if you use something like \gset io_sum_local_after_ you can also avoid
the need to repeat "io_sum_local_" so many times.

> +SELECT :io_sum_local_evictions_after > :io_sum_local_evictions_before;
> + ?column?
> +----------
> + t
> +(1 row)
> +
> +SELECT :io_sum_local_reads_after > :io_sum_local_reads_before;
> + ?column?
> +----------
> + t
> +(1 row)
> +
> +SELECT :io_sum_local_writes_after > :io_sum_local_writes_before;
> + ?column?
> +----------
> + t
> +(1 row)
> +
> +SELECT :io_sum_local_extends_after > :io_sum_local_extends_before;
> + ?column?
> +----------
> + t
> +(1 row)

Similar.

> +SELECT sum(reuses) AS io_sum_vac_strategy_reuses_before FROM pg_stat_io WHERE io_context = 'vacuum' \gset
> +SELECT sum(reads) AS io_sum_vac_strategy_reads_before FROM pg_stat_io WHERE io_context = 'vacuum' \gset

There's quite a few more instances of this, so I'll now omit further mentions.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-01-13 23:46:35 Re: pgsql: Add new GUC createrole_self_grant.
Previous Message Nathan Bossart 2023-01-13 23:30:15 Re: add PROCESS_MAIN to VACUUM