Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: melanieplageman(at)gmail(dot)com
Cc: andres(at)anarazel(dot)de, pryzby(at)telsasoft(dot)com, alvherre(at)alvh(dot)no-ip(dot)org, magnus(at)hagander(dot)net, pgsql-hackers(at)postgresql(dot)org, lukas(at)fittl(dot)com, thomas(dot)munro(at)gmail(dot)com
Subject: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Date: 2022-07-12 08:06:21
Message-ID: 20220712.170621.2168803584651416807.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Mon, 11 Jul 2022 22:22:28 -0400, Melanie Plageman <melanieplageman(at)gmail(dot)com> wrote in
> Hi,
>
> In the attached patch set, I've added in missing IO operations for
> certain IO Paths as well as enumerating in the commit message which IO
> Paths and IO Operations are not currently counted and or not possible.
>
> There is a TODO in HandleWalWriterInterrupts() about removing
> pgstat_report_wal() since it is immediately before a proc_exit()

Right. walwriter does that without needing the explicit call.

> I was wondering if LocalBufferAlloc() should increment the counter or if
> I should wait until GetLocalBufferStorage() to increment the counter.

Depends on what "allocate" means. Different from shared buffers, local
buffers are taken from OS then allocated to page. OS-allcoated pages
are restricted by num_temp_buffers so I think what we're interested in
is the count incremented by LocalBuferAlloc(). (And it is the parallel
of alloc for shared-buffers)

> I also realized that I am not differentiating between IOPATH_SHARED and
> IOPATH_STRATEGY for IOOP_FSYNC. But, given that we don't know what type
> of buffer we are fsync'ing by the time we call register_dirty_segment(),
> I'm not sure how we would fix this.

I think there scarcely happens flush for strategy-loaded buffers. If
that is sensible, IOOP_FSYNC would not make much sense for
IOPATH_STRATEGY.

> On Wed, Jul 6, 2022 at 3:20 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> > On 2022-07-05 13:24:55 -0400, Melanie Plageman wrote:
> > > @@ -176,6 +176,8 @@ InitStandaloneProcess(const char *argv0)
> > > {
> > > Assert(!IsPostmasterEnvironment);
> > >
> > > + MyBackendType = B_STANDALONE_BACKEND;
> >
> > Hm. This is used for singleuser mode as well as bootstrap. Should we
> > split those? It's not like bootstrap mode really matters for stats, so
> > I'm inclined not to.
> >
> >
> I have no opinion currently.
> It depends on how commonly you think developers might want separate
> bootstrap and single user mode IO stats.

Regarding to stats, I don't think separating them makes much sense.

> > > @@ -375,6 +376,8 @@ BootstrapModeMain(int argc, char *argv[], bool
> > check_only)
> > > * out the initial relation mapping files.
> > > */
> > > RelationMapFinishBootstrap();
> > > + // TODO: should this be done for bootstrap?
> > > + pgstat_report_io_ops();
> >
> > Hm. Not particularly useful, but also not harmful. But we don't need an
> > explicit call, because it'll be done at process exit too. At least I
> > think, it could be that it's different for bootstrap.
>
> I've removed this and other occurrences which were before proc_exit()
> (and thus redundant). (Though I did not explicitly check if it was
> different for bootstrap.)

pgstat_report_stat(true) is supposed to be called as needed via
before_shmem_hook so I think that's the right thing.

> > IMO it'd be a good idea to add pgstat_report_io_ops() to
> > pgstat_report_vacuum()/analyze(), so that the stats for a longrunning
> > autovac worker get updated more regularly.
> >
>
> noted and fixed.

> > How about moving the pgstat_report_io_ops() into
> > pgstat_report_bgwriter(), pgstat_report_autovacuum() etc? Seems
> > unnecessary to have multiple pgstat_* calls in these places.
> >
> >
> >
> noted and fixed.

+ * Also report IO Operations statistics

I think that the function comment also should mention this.

> > Wonder if it's worth making the lock specific to the backend type?
> >
>
> I've added another Lock into PgStat_IOPathOps so that each BackendType
> can be locked separately. But, I've also kept the lock in
> PgStatShared_BackendIOPathOps so that reset_all and snapshot could be
> done easily.

Looks fine about the lock separation.
By the way, in the following line:

+ &pgStatLocal.shmem->io_ops.stats[backend_type_get_idx(MyBackendType)];

backend_type_get_idx(x) is actually (x - 1) plus assertion on the
value range. And the only use-case is here. There's an reverse
function and also used only at one place.

+ Datum backend_type_desc =
+ CStringGetTextDatum(GetBackendTypeDesc(idx_get_backend_type(i)));

In this usage GetBackendTypeDesc() gracefully treats out-of-domain
values but idx_get_backend_type keenly kills the process for the
same. This is inconsistent.

My humbel opinion on this is we don't define the two functions and
replace the calls to them with (x +/- 1). Addition to that, I think
we should not abort() by invalid backend types. In that sense, I
wonder if we could use B_INVALIDth element for this purpose.

> > > + LWLockAcquire(&stats_shmem->lock, LW_SHARED);
> > > + memcpy(&reset, reset_offset, sizeof(stats_shmem->stats));
> > > + LWLockRelease(&stats_shmem->lock);
> >
> > Which then also means that you don't need the reset offset stuff. It's
> > only there because with the changecount approach we can't take a lock to
> > reset the stats (since there is no lock). With a lock you can just reset
> > the shared state.
> >
>
> Yes, I believe I have cleaned up all of this embarrassing mess. I use the
> lock in PgStatShared_BackendIOPathOps for reset all and snapshot and the
> locks in PgStat_IOPathOps for flush.

Looks fine, but I think pgstat_flush_io_ops() need more comments like
other pgstat_flush_* functions.

+ for (int i = 0; i < BACKEND_NUM_TYPES; i++)
+ stats_shmem->stats[i].stat_reset_timestamp = ts;

I'm not sure we need a separate reset timestamp for each backend type
but SLRU counter does the same thing..

> > > +pgstat_report_io_ops(void)
> > > +{
> >
> > This shouldn't be needed - the flush function above can be used.
> >
>
> Fixed.

The commit message of 0002 contains that name:p

> > > +const char *
> > > +pgstat_io_path_desc(IOPath io_path)
> > > +{
> > > + const char *io_path_desc = "Unknown IO Path";
> > > +
> >
> > This should be unreachable, right?
> >
>
> Changed it to an error.

+ elog(ERROR, "Attempt to describe an unknown IOPath");

I think we usually spell it as ("unrecognized IOPath value: %d", io_path).

> > > From f2b5b75f5063702cbc3c64efdc1e7ef3cf1acdb4 Mon Sep 17 00:00:00 2001
> > > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > > Date: Mon, 4 Jul 2022 15:44:17 -0400
> > > Subject: [PATCH v22 3/3] Add system view tracking IO ops per backend type
> >
> > > Add pg_stat_buffers, a system view which tracks the number of IO
> > > operations (allocs, writes, fsyncs, and extends) done through each IO
> > > path (e.g. shared buffers, local buffers, unbuffered IO) by each type of
> > > backend.
> >
> > I think I like pg_stat_io a bit better? Nearly everything in here seems
> > to fit better in that.
> >
> > I guess we could split out buffers allocated, but that's actually
> > interesting in the context of the kind of IO too.
> >
>
> changed it to pg_stat_io

A bit different thing, but I felt a little uneasy about some uses of
"pgstat_io_ops". IOOp looks like a neighbouring word of IOPath. On the
other hand, actually iopath is used as an attribute of io_ops in many
places. Couldn't we be more consistent about the relationship between
the names?

IOOp -> PgStat_IOOpType
IOPath -> PgStat_IOPath
PgStat_IOOpCOonters -> PgStat_IOCounters
PgStat_IOPathOps -> PgStat_IO
pgstat_count_io_op -> pgstat_count_io
...

(Better wordings are welcome.)

> > > +CREATE VIEW pg_stat_buffers AS
> > > +SELECT
> > > + b.backend_type,
> > > + b.io_path,
> > > + b.alloc,
> > > + b.extend,
> > > + b.fsync,
> > > + b.write,
> > > + b.stats_reset
> > > +FROM pg_stat_get_buffers() b;
> >
> > Do we want to expose all data to all users? I guess pg_stat_bgwriter
> > does? But this does split things out a lot more...
> >
> >
> I didn't see another similar example limiting access.

(The doc told me that) pg_buffercache view is restricted to
pg_monitor. But other activity-stats(aka stats collector:)-related
pg_stat_* views are not restricted to pg_monitor.

doc> pg_monitor Read/execute various monitoring views and functions.

Hmm....

> > Don't think you can rely on that. The lookup of the view, functions
> > might have needed to load catalog data, which might have needed to evict
> > buffers. I think you can do something more reliable by checking that
> > there's more written buffers after a checkpoint than before, or such.
> >
> >
> Yes, per an off list suggestion by you, I have changed the tests to use a
> sum of writes. I've also added a test for IOPATH_LOCAL and fixed some of
> the missing calls to count IO Operations for IOPATH_LOCAL and
> IOPATH_STRATEGY.
>
> I struggled to come up with a way to test writes for a particular
> type of backend are counted correctly since a dirty buffer could be
> written out by another type of backend before the target BackendType has
> a chance to write it out.
>
> I also struggled to come up with a way to test IO operations for
> background workers. I'm not sure of a way to deterministically have a
> background worker do a particular kind of IO in a test scenario.
>
> I'm not sure how to cause a strategy "extend" for testing.

I'm not sure what you are expecting, but for example, "create table t
as select generate_series(0, 99999)" increments Strategy-extend by
about 400. (I'm surprised that autovac worker-shared-extend has
non-zero number)

> > Would be nice to have something testing that the ringbuffer stats stuff
> > does something sensible - that feels not entirely trivial.
> >
> >
> I've added a test to test that reused strategy buffers are counted as
> allocs. I would like to add a test which checks that if a buffer in the
> ring is pinned and thus not reused, that it is not counted as a strategy
> alloc, but I found it challenging without a way to pause vacuuming, pin
> a buffer, then resume vacuuming.

===

If I'm not missing something, in BufferAlloc, when strategy is not
used and the victim is dirty, iopath is determined based on the
uninitialized from_ring. It seems to me from_ring is equivalent to
strategy_current_was_in_ring. And if StrategyGetBuffer has set
from_ring to false, StratetgyRejectBuffer may set it to true, which is
is wrong. The logic around there seems to need a rethink.

What can we read from the values separated to Shared and Strategy?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tushar 2022-07-12 08:08:16 Re: replacing role-level NOINHERIT with a grant-level option
Previous Message Masahiko Sawada 2022-07-12 07:42:24 Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns