Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, Lukas Fittl <lukas(at)fittl(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Date: 2022-11-29 02:05:33
Message-ID: CAAKRu_Zvaj_yFA_eiSRrLZsjhT0J8cJ044QhZfKuXq6WN5bu5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

v38 attached.

On Sun, Nov 20, 2022 at 7:38 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> One good follow up patch will be to rip out the accounting for
> pg_stat_bgwriter's buffers_backend, buffers_backend_fsync and perhaps
> buffers_alloc and replace it with a subselect getting the equivalent data from
> pg_stat_io. It might not be quite worth doing for buffers_alloc because of
> the way that's tied into bgwriter pacing.

I don't see how it will make sense to have buffers_backend and
buffers_backend_fsync respond to a different reset target than the rest
of the fields in pg_stat_bgwriter.

> On 2022-11-03 13:00:24 -0400, Melanie Plageman wrote:
> > @@ -833,6 +836,22 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
> >
> > isExtend = (blockNum == P_NEW);
> >
> > + if (isLocalBuf)
> > + {
> > + /*
> > + * Though a strategy object may be passed in, no strategy is employed
> > + * when using local buffers. This could happen when doing, for example,
> > + * CREATE TEMPORRARY TABLE AS ...
> > + */
> > + io_context = IOCONTEXT_BUFFER_POOL;
> > + io_object = IOOBJECT_TEMP_RELATION;
> > + }
> > + else
> > + {
> > + io_context = IOContextForStrategy(strategy);
> > + io_object = IOOBJECT_RELATION;
> > + }
>
> I think given how frequently ReadBuffer_common() is called in some workloads,
> it'd be good to make IOContextForStrategy inlinable. But I guess that's not
> easily doable, because struct BufferAccessStrategyData is only defined in
> freelist.c.

Correct

> Could we defer this until later, given that we don't currently need this in
> case of buffer hits afaict?

Yes, you are right. In ReadBuffer_common(), we can easily move the
IOContextForStrategy() call to directly before using io_context. I've
done that in the attached version.

> > @@ -1121,6 +1144,8 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
> > BufferAccessStrategy strategy,
> > bool *foundPtr)
> > {
> > + bool from_ring;
> > + IOContext io_context;
> > BufferTag newTag; /* identity of requested block */
> > uint32 newHash; /* hash value for newTag */
> > LWLock *newPartitionLock; /* buffer partition lock for it */
> > @@ -1187,9 +1212,12 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
> > */
> > LWLockRelease(newPartitionLock);
> >
> > + io_context = IOContextForStrategy(strategy);
>
> Hm - doesn't this mean we do IOContextForStrategy() twice? Once in
> ReadBuffer_common() and then again here?

Yes. So, there are a few options for addressing this.

- if the goal is to call IOStrategyForContext() exactly once in a
given codepath, BufferAlloc() can set IOContext
(passed by reference as an output parameter). I don't like this much
because it doesn't make sense to me that BufferAlloc() would set the
"io_context" parameter -- especially given that strategy is already
passed as a parameter and is obviously available to the caller.
I also don't see a good way of waiting until BufferAlloc() returns to count
the IO operations counted in FlushBuffer() and BufferAlloc() itself.

- if the goal is to avoid calling IOStrategyForContext() in more common
codepaths or to call it as close to its use as possible, then we can
push down its call in BufferAlloc() to the two locations where it is
used -- when a dirty buffer must be flushed and when a block was
evicted or reused. This will avoid calling it when we are not evicting
a block from a valid buffer.

However, if we do that, I don't know how to avoid calling it twice in
that codepath. Even though we can assume io_context was set in the
first location by the time we get to the second location, we would
need to initialize the variable with something if we only plan to set
it in some branches and there is no "invalid" or "default" value of
the IOContext enum.

Given the above, I've left the call in BufferAlloc() as is in the
attached version.

>
>
> > /* Loop here in case we have to try another victim buffer */
> > for (;;)
> > {
> > +
> > /*
> > * Ensure, while the spinlock's not yet held, that there's a free
> > * refcount entry.
> > @@ -1200,7 +1228,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
> > * Select a victim buffer. The buffer is returned with its header
> > * spinlock still held!
> > */
> > - buf = StrategyGetBuffer(strategy, &buf_state);
> > + buf = StrategyGetBuffer(strategy, &buf_state, &from_ring);
> >
> > Assert(BUF_STATE_GET_REFCOUNT(buf_state) == 0);
> >
>
> I think patch 0001 relies on this change already having been made, If I am not misunderstanding?

Fixed.

>
>
> > @@ -1263,13 +1291,34 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
> > }
> > }
> >
> > + /*
> > + * When a strategy is in use, only flushes of dirty buffers
> > + * already in the strategy ring are counted as strategy writes
> > + * (IOCONTEXT [BULKREAD|BULKWRITE|VACUUM] IOOP_WRITE) for the
> > + * purpose of IO operation statistics tracking.
> > + *
> > + * If a shared buffer initially added to the ring must be
> > + * flushed before being used, this is counted as an
> > + * IOCONTEXT_BUFFER_POOL IOOP_WRITE.
> > + *
> > + * If a shared buffer added to the ring later because the
>
> Missing word?

Fixed.

>
>
> > + * current strategy buffer is pinned or in use or because all
> > + * strategy buffers were dirty and rejected (for BAS_BULKREAD
> > + * operations only) requires flushing, this is counted as an
> > + * IOCONTEXT_BUFFER_POOL IOOP_WRITE (from_ring will be false).
>
> I think this makes sense for now, but it'd be good if somebody else could
> chime in on this...
>
> > + *
> > + * When a strategy is not in use, the write can only be a
> > + * "regular" write of a dirty shared buffer (IOCONTEXT_BUFFER_POOL
> > + * IOOP_WRITE).
> > + */
> > +
> > /* OK, do the I/O */
> > TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY_START(forkNum, blockNum,
> > smgr->smgr_rlocator.locator.spcOid,
> > smgr->smgr_rlocator.locator.dbOid,
> > smgr->smgr_rlocator.locator.relNumber);
> >
> > - FlushBuffer(buf, NULL);
> > + FlushBuffer(buf, NULL, io_context, IOOBJECT_RELATION);
> > LWLockRelease(BufferDescriptorGetContentLock(buf));
> > ScheduleBufferTagForWriteback(&BackendWritebackContext,
>
>
>
> > + if (oldFlags & BM_VALID)
> > + {
> > + /*
> > + * When a BufferAccessStrategy is in use, evictions adding a
> > + * shared buffer to the strategy ring are counted in the
> > + * corresponding strategy's context.
>
> Perhaps "adding a shared buffer to the ring are counted in the corresponding
> context"? "strategy's context" sounds off to me.

Fixed.

> > This includes the evictions
> > + * done to add buffers to the ring initially as well as those
> > + * done to add a new shared buffer to the ring when current
> > + * buffer is pinned or otherwise in use.
>
> I think this sentence could use a few commas, but not sure.
>
> s/current/the current/?

Reworded.

>
> > + * We wait until this point to count reuses and evictions in order to
> > + * avoid incorrectly counting a buffer as reused or evicted when it was
> > + * released because it was concurrently pinned or in use or counting it
> > + * as reused when it was rejected or when we errored out.
> > + */
>
> I can't quite parse this sentence.

I've reworded the whole comment.
I think it is clearer now.

>
> > + IOOp io_op = from_ring ? IOOP_REUSE : IOOP_EVICT;
> > +
> > + pgstat_count_io_op(io_op, IOOBJECT_RELATION, io_context);
> > + }
>
> I'd just inline the variable, but ...

Done.

> > @@ -196,6 +197,7 @@ LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
> > LocalRefCount[b]++;
> > ResourceOwnerRememberBuffer(CurrentResourceOwner,
> > BufferDescriptorGetBuffer(bufHdr));
> > +
> > break;
> > }
> > }
>
> Spurious change.

Removed.

> > pg_atomic_unlocked_write_u32(&bufHdr->state, buf_state);
> >
> > *foundPtr = false;
> > +
> > return bufHdr;
> > }
>
> Dito.

Removed.

> > +/*
> > +* IO Operation statistics are not collected for all BackendTypes.
> > +*
> > +* The following BackendTypes do not participate in the cumulative stats
> > +* subsystem or do not do IO operations worth reporting statistics on:
>
> s/worth reporting/we currently report/?

Updated

> > + /*
> > + * In core Postgres, only regular backends and WAL Sender processes
> > + * executing queries will use local buffers and operate on temporary
> > + * relations. Parallel workers will not use local buffers (see
> > + * InitLocalBuffers()); however, extensions leveraging background workers
> > + * have no such limitation, so track IO Operations on
> > + * IOOBJECT_TEMP_RELATION for BackendType B_BG_WORKER.
> > + */
> > + no_temp_rel = bktype == B_AUTOVAC_LAUNCHER || bktype == B_BG_WRITER || bktype
> > + == B_CHECKPOINTER || bktype == B_AUTOVAC_WORKER || bktype ==
> > + B_STANDALONE_BACKEND || bktype == B_STARTUP;
> > +
> > + if (no_temp_rel && io_context == IOCONTEXT_BUFFER_POOL && io_object ==
> > + IOOBJECT_TEMP_RELATION)
> > + return false;
>
> Personally I don't like line breaks on the == and would rather break earlier
> on the && or ||.

I've gone through and fixed all of these that I could find.

> > + for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
> > + {
> > + PgStatShared_IOObjectOps *shared_objs = &type_shstats->data[io_context];
> > + PgStat_IOObjectOps *pending_objs = &pending_IOOpStats.data[io_context];
> > +
> > + for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
> > + {
>
> Is there any compiler that'd complain if you used IOContext/IOObject/IOOp as the
> type in the for loop? I don't think so? Then you'd not need the casts in other
> places, which I think would make the code easier to read.

I changed the type and currently get no compiler warnings, however, on
a previous CI run,
with the type changed to an enum I got the following warning:

/tmp/cirrus-ci-build/src/include/utils/pgstat_internal.h:605:48:
error: no ‘operator++(int)’ declared for postfix ‘++’ [-fpermissive]
605 | io_context < IOCONTEXT_NUM_TYPES; io_context++)

I'm not sure why I am no longer getting it.

> > + PgStat_IOOpCounters *sharedent = &shared_objs->data[io_object];
> > + PgStat_IOOpCounters *pendingent = &pending_objs->data[io_object];
> > +
> > + if (!expect_backend_stats ||
> > + !pgstat_bktype_io_context_io_object_valid(MyBackendType,
> > + (IOContext) io_context, (IOObject) io_object))
> > + {
> > + pgstat_io_context_ops_assert_zero(sharedent);
> > + pgstat_io_context_ops_assert_zero(pendingent);
> > + continue;
> > + }
> > +
> > + for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
> > + {
> > + if (!(pgstat_io_op_valid(MyBackendType, (IOContext) io_context,
> > + (IOObject) io_object, (IOOp) io_op)))
>
> Superfluous parens after the !, I think?

Thanks! I've looked for other occurrences as well and fixed them.

> > void
> > pgstat_report_vacuum(Oid tableoid, bool shared,
> > @@ -257,10 +257,18 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
> > }
> >
> > pgstat_unlock_entry(entry_ref);
> > +
> > + /*
> > + * Flush IO Operations statistics now. pgstat_report_stat() will flush IO
> > + * Operation stats, however this will not be called after an entire
>
> Missing "until"?

Fixed.

> > +static inline void
> > +pgstat_io_op_assert_zero(PgStat_IOOpCounters *counters, IOOp io_op)
> > +{
>
> Does this need to be in pgstat.h? Perhaps pgstat_internal.h would suffice,
> afaict it's not used outside of pgstat code?

It is used in pgstatfuncs.c during the view creation.

> > +
> > +/*
> > + * Assert that stats have not been counted for any combination of IOContext,
> > + * IOObject, and IOOp which is not valid for the passed-in BackendType. The
> > + * passed-in array of PgStat_IOOpCounters must contain stats from the
> > + * BackendType specified by the second parameter. Caller is responsible for
> > + * locking of the passed-in PgStatShared_IOContextOps, if needed.
> > + */
> > +static inline void
> > +pgstat_backend_io_stats_assert_well_formed(PgStatShared_IOContextOps *backend_io_context_ops,
> > + BackendType bktype)
> > +{
>
> This doesn't look like it should be an inline function - it's quite long.
>
> I think it's also too complicated for the compiler to optimize out if
> assertions are disabled. So you'd need to handle this with an explicit #ifdef
> USE_ASSERT_CHECKING.

I've made it a static helper function in pgstat.c.

>
> > + <row>
> > + <entry role="catalog_table_entry"><para role="column_definition">
> > + <structfield>io_context</structfield> <type>text</type>
> > + </para>
> > + <para>
> > + The context or location of an IO operation.
> > + </para>
> > + <itemizedlist>
> > + <listitem>
> > + <para>
> > + <varname>io_context</varname> <literal>buffer pool</literal> refers to
> > + IO operations on data in both the shared buffer pool and process-local
> > + buffer pools used for temporary relation data.
> > + </para>
> > + <para>
>
> The indentation in the sgml part of the patch seems to be a bit wonky.

I'll address this and the other docs feedback in a separate patchset and email.

> > +Datum
> > +pg_stat_get_io(PG_FUNCTION_ARGS)
> > +{
> > + PgStat_BackendIOContextOps *backends_io_stats;
> > + ReturnSetInfo *rsinfo;
> > + Datum reset_time;
> > +
> > + InitMaterializedSRF(fcinfo, 0);
> > + rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
> > +
> > + backends_io_stats = pgstat_fetch_backend_io_context_ops();
> > +
> > + reset_time = TimestampTzGetDatum(backends_io_stats->stat_reset_timestamp);
> > +
> > + for (int bktype = 0; bktype < BACKEND_NUM_TYPES; bktype++)
> > + {
> > + Datum bktype_desc = CStringGetTextDatum(GetBackendTypeDesc((BackendType) bktype));
> > + bool expect_backend_stats = true;
> > + PgStat_IOContextOps *io_context_ops = &backends_io_stats->stats[bktype];
> > +
> > + /*
> > + * For those BackendTypes without IO Operation stats, skip
> > + * representing them in the view altogether.
> > + */
> > + expect_backend_stats = pgstat_io_op_stats_collected((BackendType)
> > + bktype);
> > +
> > + for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
> > + {
> > + const char *io_context_str = pgstat_io_context_desc(io_context);
> > + PgStat_IOObjectOps *io_objs = &io_context_ops->data[io_context];
> > +
> > + for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
> > + {
> > + PgStat_IOOpCounters *counters = &io_objs->data[io_object];
> > + const char *io_obj_str = pgstat_io_object_desc(io_object);
> > +
> > + Datum values[IO_NUM_COLUMNS] = {0};
> > + bool nulls[IO_NUM_COLUMNS] = {0};
> > +
> > + /*
> > + * Some combinations of IOContext, IOObject, and BackendType are
> > + * not valid for any type of IOOp. In such cases, omit the
> > + * entire row from the view.
> > + */
> > + if (!expect_backend_stats ||
> > + !pgstat_bktype_io_context_io_object_valid((BackendType) bktype,
> > + (IOContext) io_context, (IOObject) io_object))
> > + {
> > + pgstat_io_context_ops_assert_zero(counters);
> > + continue;
> > + }
>
> Perhaps mention in a comment two loops up that we don't skip the nested loops
> despite !expect_backend_stats because we want to assert here?

Done.

I've also removed the test for bulkread reads from regress because
CREATE DATABASE is expensive and added it to the verify_heapam test
since it is one of the only users of a BULKREAD strategy which
unconditionally uses a BULKREAD strategy.

Thanks,
Melanie

Attachment Content-Type Size
v38-0001-Remove-BufferAccessStrategyData-current_was_in_r.patch application/octet-stream 5.4 KB
v38-0004-Add-system-view-tracking-IO-ops-per-backend-type.patch application/octet-stream 53.5 KB
v38-0002-Track-IO-operation-statistics-locally.patch application/octet-stream 28.1 KB
v38-0003-Aggregate-IO-operation-stats-per-BackendType.patch application/octet-stream 23.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2022-11-29 02:08:36 Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Previous Message Robert Haas 2022-11-29 01:33:03 Re: fixing CREATEROLE