Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Lukas Fittl <lukas(at)fittl(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Date: 2020-01-25 14:43:41
Message-ID: CABUevEw=TnHVtoJtD+A1KjPC8RefQGRGJh-LbCgdJ_pDbUw2Zw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 24, 2020 at 8:52 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> Currently pg_stat_bgwriter.buffers_backend is pretty useless to gauge
> whether backends are doing writes they shouldn't do. That's because it
> counts things that are either unavoidably or unlikely doable by other
> parts of the system (checkpointer, bgwriter).
> In particular extending the file can not currently be done by any
> another type of process, yet is counted. When using a buffer access
> strategy it is also very likely that writes have to be done by the
> 'dirtying' backend itself, as the buffer will be reused soon after (when
> not previously in s_b that is).

Yeah. That's quite annoying.

> Additionally pg_stat_bgwriter.buffers_backend also counts writes done by
> autovacuum et al.
>
>
> I think it'd make sense to at least split buffers_backend into
> buffers_backend_extend,
> buffers_backend_write,
> buffers_backend_write_strat
>
> but it could also be worthwhile to expand it into
> buffers_backend_extend,
> buffers_{backend,checkpoint,bgwriter,autovacuum}_write
> buffers_{backend,autovacuum}_write_stat

Given that these are individual global counters, I don't really see
any reason not to expand it to the bigger set of counters. It's easy
enough to add them up together later if needed.

> Possibly by internally, in contrast to SQL level, having just counter
> arrays indexed by backend types.
>
>
> It's also noteworthy that buffers_backend is accounted in an absurd
> manner. One might think that writes are accounted from backend -> shared
> memory or such. But instead it works like this:
>
> 1) backend flushes buffer in bufmgr.c, accounts for backend *write time*
> 2) mdwrite writes and registers a sync request, which forwards the sync request to checkpointer
> 3) ForwardSyncRequest(), when not called by bgwriter, increments CheckpointerShmem->num_backend_writes
> 4) checkpointer, whenever doing AbsorbSyncRequests(), moves
> CheckpointerShmem->num_backend_writes to
> BgWriterStats.m_buf_written_backend (local memory!)
> 5) Occasionally it calls pgstat_send_bgwriter(), which sends the data to
> pgstat (which bgwriter also does)
> 6) Which then updates the shared memory used by the display functions
>
> Worthwhile to note that backend buffer read/write *time* is accounted
> differently. That's done via pgstat_send_tabstat().
>
>
> I think there's very little excuse for the indirection via checkpointer,
> besides architectually being weird, it actually requires that we
> continue to wake up checkpointer over and over instead of optimizing how
> and when we submit fsync requests.
>
> As far as I can tell we're also simply not accounting at all for writes
> done outside of shared buffers. All writes done directly through
> smgrwrite()/extend() aren't accounted anywhere as far as I can tell.
>
>
> I think we also count things as writes that aren't writes: mdtruncate()
> is AFAICT counted as one backend write for each segment. Which seems
> weird to me.

It's at least slightly weird :) Might it be worth counting truncate
events separately?

> Lastly, I don't understand what the point of sending fixed size stats,
> like the stuff underlying pg_stat_bgwriter, through pgstats IPC. While
> I don't like it's architecture, we obviously need something like pgstat
> to handle variable amounts of stats (database, table level etc
> stats). But that doesn't at all apply to these types of global stats.

That part has annoyed me as well a few times. +1 for just moving that
into a global shared memory. Given that we don't really care about
things being in sync between those different counters *or* if we loose
a bit of data (which the stats collector is designed to do), we could
even do that without a lock?

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message 曾文旌 (义从) 2020-01-25 15:15:48 Re: [Proposal] Global temporary tables
Previous Message Dean Rasheed 2020-01-25 14:18:33 Re: Greatest Common Divisor