Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Lukas Fittl <lukas(at)fittl(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Date: 2022-07-12 16:19:06
Message-ID: CAAKRu_ZFUkRKP08-L0413+g3XtYCjqJNPT2A7q0eT_otz_ei=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the review!

On Tue, Jul 12, 2022 at 4:06 AM Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
wrote:

> At Mon, 11 Jul 2022 22:22:28 -0400, Melanie Plageman <
> melanieplageman(at)gmail(dot)com> wrote in
> > Hi,
> >
> > In the attached patch set, I've added in missing IO operations for
> > certain IO Paths as well as enumerating in the commit message which IO
> > Paths and IO Operations are not currently counted and or not possible.
> >
> > There is a TODO in HandleWalWriterInterrupts() about removing
> > pgstat_report_wal() since it is immediately before a proc_exit()
>
> Right. walwriter does that without needing the explicit call.
>

I have deleted it.

>
> > I was wondering if LocalBufferAlloc() should increment the counter or if
> > I should wait until GetLocalBufferStorage() to increment the counter.
>
> Depends on what "allocate" means. Different from shared buffers, local
> buffers are taken from OS then allocated to page. OS-allcoated pages
> are restricted by num_temp_buffers so I think what we're interested in
> is the count incremented by LocalBuferAlloc(). (And it is the parallel
> of alloc for shared-buffers)
>

I've left it in LocalBufferAlloc().

>
> > I also realized that I am not differentiating between IOPATH_SHARED and
> > IOPATH_STRATEGY for IOOP_FSYNC. But, given that we don't know what type
> > of buffer we are fsync'ing by the time we call register_dirty_segment(),
> > I'm not sure how we would fix this.
>
> I think there scarcely happens flush for strategy-loaded buffers. If
> that is sensible, IOOP_FSYNC would not make much sense for
> IOPATH_STRATEGY.
>

Why would it be less likely for a backend to do its own fsync when
flushing a dirty strategy buffer than a regular dirty shared buffer?

>
> > > IMO it'd be a good idea to add pgstat_report_io_ops() to
> > > pgstat_report_vacuum()/analyze(), so that the stats for a longrunning
> > > autovac worker get updated more regularly.
> > >
> >
> > noted and fixed.
>
> > > How about moving the pgstat_report_io_ops() into
> > > pgstat_report_bgwriter(), pgstat_report_autovacuum() etc? Seems
> > > unnecessary to have multiple pgstat_* calls in these places.
> > >
> > >
> > >
> > noted and fixed.
>
> + * Also report IO Operations statistics
>
> I think that the function comment also should mention this.
>

I've added comments at the top of all these functions.

>
> > > Wonder if it's worth making the lock specific to the backend type?
> > >
> >
> > I've added another Lock into PgStat_IOPathOps so that each BackendType
> > can be locked separately. But, I've also kept the lock in
> > PgStatShared_BackendIOPathOps so that reset_all and snapshot could be
> > done easily.
>
> Looks fine about the lock separation.
>

Actually, I think it is not safe to use both of these locks. So for
picking one method, it is probably better to go with the locks in
PgStat_IOPathOps, it will be more efficient for flush (and not for
fetching and resetting), so that is probably the way to go, right?

> By the way, in the following line:
>
> +
> &pgStatLocal.shmem->io_ops.stats[backend_type_get_idx(MyBackendType)];
>
> backend_type_get_idx(x) is actually (x - 1) plus assertion on the
> value range. And the only use-case is here. There's an reverse
> function and also used only at one place.
>
> + Datum backend_type_desc =
> +
> CStringGetTextDatum(GetBackendTypeDesc(idx_get_backend_type(i)));
>
> In this usage GetBackendTypeDesc() gracefully treats out-of-domain
> values but idx_get_backend_type keenly kills the process for the
> same. This is inconsistent.
>
> My humbel opinion on this is we don't define the two functions and
> replace the calls to them with (x +/- 1). Addition to that, I think
> we should not abort() by invalid backend types. In that sense, I
> wonder if we could use B_INVALIDth element for this purpose.
>

I think that GetBackendTypeDesc() should probably also error out for an
unknown value.

I would be open to not using the helper functions. I thought it would be
less error-prone, but since it is limited to the code in
pgstat_io_ops.c, it is probably okay. Let me think a bit more.

Could you explain more about what you mean about using B_INVALID
BackendType?

>
> > > > + LWLockAcquire(&stats_shmem->lock, LW_SHARED);
> > > > + memcpy(&reset, reset_offset, sizeof(stats_shmem->stats));
> > > > + LWLockRelease(&stats_shmem->lock);
> > >
> > > Which then also means that you don't need the reset offset stuff. It's
> > > only there because with the changecount approach we can't take a lock
> to
> > > reset the stats (since there is no lock). With a lock you can just
> reset
> > > the shared state.
> > >
> >
> > Yes, I believe I have cleaned up all of this embarrassing mess. I use the
> > lock in PgStatShared_BackendIOPathOps for reset all and snapshot and the
> > locks in PgStat_IOPathOps for flush.
>
> Looks fine, but I think pgstat_flush_io_ops() need more comments like
> other pgstat_flush_* functions.
>
> + for (int i = 0; i < BACKEND_NUM_TYPES; i++)
> + stats_shmem->stats[i].stat_reset_timestamp = ts;
>
> I'm not sure we need a separate reset timestamp for each backend type
> but SLRU counter does the same thing..
>

Yes, I think for SLRU stats it is because you can reset individual SLRU
stats. Also there is no wrapper data structure to put it in. I could
keep it in PgStatShared_BackendIOPathOps since you have to reset all IO
operation stats at once, but I am thinking of getting rid of
PgStatShared_BackendIOPathOps since it is not needed if I only keep the
locks in PgStat_IOPathOps and make the global shared value an array of
PgStat_IOPathOps.

>
> > > > +pgstat_report_io_ops(void)
> > > > +{
> > >
> > > This shouldn't be needed - the flush function above can be used.
> > >
> >
> > Fixed.
>
> The commit message of 0002 contains that name:p
>

Thanks! Fixed.

>
> > > > +const char *
> > > > +pgstat_io_path_desc(IOPath io_path)
> > > > +{
> > > > + const char *io_path_desc = "Unknown IO Path";
> > > > +
> > >
> > > This should be unreachable, right?
> > >
> >
> > Changed it to an error.
>
> + elog(ERROR, "Attempt to describe an unknown IOPath");
>
> I think we usually spell it as ("unrecognized IOPath value: %d", io_path).
>

I have changed to this.

>
> > > > From f2b5b75f5063702cbc3c64efdc1e7ef3cf1acdb4 Mon Sep 17 00:00:00
> 2001
> > > > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > > > Date: Mon, 4 Jul 2022 15:44:17 -0400
> > > > Subject: [PATCH v22 3/3] Add system view tracking IO ops per backend
> type
> > >
> > > > Add pg_stat_buffers, a system view which tracks the number of IO
> > > > operations (allocs, writes, fsyncs, and extends) done through each IO
> > > > path (e.g. shared buffers, local buffers, unbuffered IO) by each
> type of
> > > > backend.
> > >
> > > I think I like pg_stat_io a bit better? Nearly everything in here seems
> > > to fit better in that.
> > >
> > > I guess we could split out buffers allocated, but that's actually
> > > interesting in the context of the kind of IO too.
> > >
> >
> > changed it to pg_stat_io
>
> A bit different thing, but I felt a little uneasy about some uses of
> "pgstat_io_ops". IOOp looks like a neighbouring word of IOPath. On the
> other hand, actually iopath is used as an attribute of io_ops in many
> places. Couldn't we be more consistent about the relationship between
> the names?
>
> IOOp -> PgStat_IOOpType
> IOPath -> PgStat_IOPath
> PgStat_IOOpCOonters -> PgStat_IOCounters
> PgStat_IOPathOps -> PgStat_IO
> pgstat_count_io_op -> pgstat_count_io
> ...
>
> (Better wordings are welcome.)
>

Let me think about naming and make changes in the next version.

> > > Would be nice to have something testing that the ringbuffer stats stuff
> > > does something sensible - that feels not entirely trivial.
> > >
> > >
> > I've added a test to test that reused strategy buffers are counted as
> > allocs. I would like to add a test which checks that if a buffer in the
> > ring is pinned and thus not reused, that it is not counted as a strategy
> > alloc, but I found it challenging without a way to pause vacuuming, pin
> > a buffer, then resume vacuuming.
>
> ===
>
> If I'm not missing something, in BufferAlloc, when strategy is not
> used and the victim is dirty, iopath is determined based on the
> uninitialized from_ring. It seems to me from_ring is equivalent to
> strategy_current_was_in_ring. And if StrategyGetBuffer has set
> from_ring to false, StratetgyRejectBuffer may set it to true, which is
> is wrong. The logic around there seems to need a rethink.
>
> What can we read from the values separated to Shared and Strategy?
>

I have changed this local variable to only be used for communicating if
the buffer which was not rejected by StrategyRejectBuffer() was from the
ring or not for the purposes of counting strategy writes. I could add an
accessor for this member (strategy->current_was_in_ring) if that makes
more sense? For strategy allocs, I just use
strategy->current_was_in_ring inside of StrategyGetBuffer() since this
has access to that member of the struct.

Currently, strategy allocs count only reuses of a strategy buffer (not
initial shared buffers which are added to the ring).
strategy writes count only the writing out of dirty buffers which are
already in the ring and are being reused.

Alternatively, we could also count as strategy allocs all those buffers
which are added to the ring and count as strategy writes all those
shared buffers which are dirty when initially added to the ring.

- Melanie

Attachment Content-Type Size
v24-0001-Add-BackendType-for-standalone-backends.patch text/x-patch 2.7 KB
v24-0003-Add-system-view-tracking-IO-ops-per-backend-type.patch text/x-patch 16.4 KB
v24-0002-Track-IO-operation-statistics.patch text/x-patch 32.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-07-12 17:01:31 Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Previous Message Dilip Kumar 2022-07-12 15:26:08 Re: making relfilenodes 56 bits