Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Lukas Fittl <lukas(at)fittl(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Date: 2022-11-03 17:00:24
Message-ID: CAAKRu_bb8GupZOLBasf2XkHBrkj8ow8=ZOmcBOj4_G90h2TFKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

v37 attached

On Sun, Oct 30, 2022 at 9:09 PM Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com> wrote:
>
> On Wed, Oct 26, 2022 at 10:55 AM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
>
> + The <structname>pg_statio_</structname> and
> + <structname>pg_stat_io</structname> views are primarily useful to determine
> + the effectiveness of the buffer cache. When the number of actual disk reads
>
> Totally nitpicking, but this reads a little funny to me. Previously
> the trailing underscore suggested this is a group, and now with
> pg_stat_io itself added (stupid question: should this be
> "pg_statio"?), it sounds like we're talking about two views:
> pg_stat_io and "pg_statio_". Maybe something like "The pg_stat_io view
> and the pg_statio_ set of views are primarily..."?

I decided not to call it pg_statio because all of the other stats views
have an underscore after stat and I thought it was an opportunity to be
consistent with them.

> + by that backend type in that IO context. Currently only a subset of IO
> + operations are tracked here. WAL IO, IO on temporary files, and some forms
> + of IO outside of shared buffers (such as when building indexes or moving a
> + table from one tablespace to another) could be added in the future.
>
> Again nitpicking, but should this be "may be added"? I think "could"
> suggests the possibility of implementation, whereas "may" feels more
> like a hint as to how the feature could evolve.

I've adopted the wording you suggested.

> + portion of the main shared buffer pool. This pattern is called a
> + <quote>Buffer Access Strategy</quote> in the
> + <productname>PostgreSQL</productname> source code and the fixed-size
> + ring buffer is referred to as a <quote>strategy ring buffer</quote> for
> + the purposes of this view's documentation.
> + </para></entry>
>
> Nice, I think this explanation is very helpful. You also use the term
> "strategy context" and "strategy operation" below. I think it's fairly
> obvious what those mean, but pointing it out in case we want to note
> that here, too.

Thanks! I've added definitions of those as well.

> + <varname>read</varname> and <varname>extended</varname> for
>
> Maybe "plus" instead of "and" here for clarity (I'm assuming that's
> what the "and" means)?

Modified this -- in some cases by adding the lists mentioned below

> + <varname>backend_type</varname>s <literal>autovacuum launcher</literal>,
> + <literal>autovacuum worker</literal>, <literal>client backend</literal>,
> + <literal>standalone backend</literal>, <literal>background
> + worker</literal>, and <literal>walsender</literal> for all
> + <varname>io_context</varname>s is similar to the sum of
>
> I'm reviewing the rendered docs now, and I noticed sentences like this
> are a bit hard to scan: they force the reader to parse a big list of
> backend types before even getting to the meat of what this is talking
> about. Should we maybe reword this so that the backend list comes at
> the end of the sentence? Or maybe even use a list (e.g., like in the
> "state" column description in pg_stat_activity)?

Good idea with the bullet points.
For the lengthy lists, I've added bullet point lists to the docs for
several of the columns. It is quite long now but, hopefully, clearer?
Let me know if you think it improves the readability.

> + <varname>heap_blks_read</varname>, <varname>idx_blks_read</varname>,
> + <varname>tidx_blks_read</varname>, and
> + <varname>toast_blks_read</varname> in <link
> + linkend="monitoring-pg-statio-all-tables-view">
> + <structname>pg_statio_all_tables</structname></link>. and
> + <varname>blks_read</varname> from <link
>
> I think that's a stray period before the "and."

Fixed!

> + Normal client backends should be able to rely on maintenance processes
> + like the checkpointer and background writer to write out dirty data as
>
> Nice--it's great to see this mentioned. But I think these are
> generally referred to as "auxiliary" not "maintenance" processes, no?

Thanks! Fixed.

> + <entry role="catalog_table_entry"><para role="column_definition">
> + <structfield>bytes_conversion</structfield> <type>bigint</type>
> + </para>
>
> I think this general approach works (instead of unit). I'm not wild
> about the name, but I don't really have a better suggestion. Maybe
> "op_bytes" (since each cell is counting the number of I/O operations)?
> But I think bytes_conversion is okay.

I really like op_bytes and have changed it to this. Thanks for the
suggestion!

> Also, is this (in the middle of the table) the right place for this
> column? I would have expected to see it before or after all the actual
> I/O op cells.

I put it after read, write, and extend columns because it applies to
them. It doesn't apply to files_synced. For reused and evicted, I didn't
think bytes reused and evicted made sense. Also, when we add non-block
oriented IO, reused and evicted won't be used but op_bytes will be. So I
thought it made more sense to place it after the operations it applies
to.

> + <varname>io_context</varname>s. When a <quote>Buffer Access
> + Strategy</quote> reuses a buffer in the strategy ring, it must evict its
> + contents, incrementing <varname>reused</varname>. When a <quote>Buffer
> + Access Strategy</quote> adds a new shared buffer to the strategy ring
> + and this shared buffer is occupied, the <quote>Buffer Access
> + Strategy</quote> must evict the contents of the shared buffer,
> + incrementing <varname>evicted</varname>.
>
> I think the parallel phrasing here makes this a little hard to follow.
> Specifically, I think "must evict its contents" for the strategy case
> sounds like a bad thing, but in fact this is a totally normal thing
> that happens as part of strategy access, no? The idea is you probably
> won't need that buffer again, so it's fine to evict it. I'm not sure
> how to reword, but I think the current phrasing is misleading.

I had trouble rephrasing this. I changed a few words. I see what you
mean. It is worth noting that reusing strategy buffers when there are
buffers on the freelist may not be the best behavior, so I wouldn't
necessarily consider "reused" a good thing. However, I'm not sure how
much the user could really do about this. I would at least like this
phrasing to be clear (evicted is for shared buffers, reused is for
strategy buffers), so, perhaps this section requires more work.

> + The number of times a <literal>bulkread</literal> found the current
> + buffer in the fixed-size strategy ring dirty and requiring flush.
>
> Maybe "...found ... to be dirty..."?

Changed to this wording.

> + frequent vacuuming or more aggressive autovacuum settings, as buffers are
> + dirtied during a bulkread operation when updating the hint bit or when
> + performing on-access pruning.
>
> Are there docs to cross-reference here, especially for pruning? I
> couldn't find much except a few un-explained mentions in the page
> layout docs [2], and most of the search results refer to partition
> pruning. Searching for hint bits at least gives some info in blog
> posts and the wiki.

yes, I don't see anything explaining this either -- below the page
layout it discusses tuple layout but that doesn't mention hint bits.

> + again. A high number of repossessions is a sign of contention for the
> + blocks operated on by the strategy operation.
>
> This (and in general the repossession description) makes sense, but
> I'm not sure what to do with the information. Maybe Andres is right
> that we could skip this in the first version?

I've removed repossessed and rejected in attached v37. I am a bit sad
about this because I don't see a good way forward and I think those
could be useful for users.

I have added the new column Andres recommended in [1] ("io_object") to
clarify temp and local buffers and pave the way for bypass IO (IO not
done through a buffer pool), which can be done on temp or permanent
files for temp or permanent relations, and spill file IO which is done
on temporary files but isn't related to temporary tables.

IOObject has increased the memory footprint and complexity of the code
around tracking and accumulating the statistics, though it has not
increased the number of rows in the view.

One question I still have about this additional dimension is how much
enumeration we need of the various combinations of IO operations, IO
objects, IO ops, and backend types which are allowed and not allowed.
Currently because it is only valid to operate on both IOOBJECT_RELATION
and IOOBJECT_TEMP_RELATION in IOCONTEXT_BUFFER_POOL, the changes to the
various functions asserting and validating what is "allowed" in terms of
combinations of ops, objects, contexts, and backend types aren't much
different than they were without IO Object. However, once we begin
adding other objects and contexts, we will need to make this logic more
comprehensive. I'm not sure whether or not I should do that
preemptively.

> On Mon, Oct 24, 2022 at 12:39 PM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
> > > I don't quite follow this: does this mean that I should expect
> > > 'reused' and 'evicted' to be equal in the 'shared' context, because
> > > they represent the same thing? Or will 'reused' just be null because
> > > it's not distinct from 'evicted'? It looks like it's null right now,
> > > but I find the wording here confusing.
> >
> > You should only see evictions when the strategy evicts shared buffers
> > and reuses when the strategy evicts existing strategy buffers.
> >
> > How about this instead in this docs?
> >
> > the number of times an existing buffer in the strategy ring was reused
> > as part of an operation in the <literal>bulkread</literal>,
> > <literal>bulkwrite</literal>, or <literal>vacuum</literal>
> > <varname>io_context</varname>s. when a buffer access strategy
> > <quote>reuses</quote> a buffer in the strategy ring, it must evict its
> > contents, incrementing <varname>reused</varname>. when a buffer access
> > strategy adds a new shared buffer to the strategy ring and this shared
> > buffer is occupied, the buffer access strategy must evict the contents
> > of the shared buffer, incrementing <varname>evicted</varname>.
>
> It looks like you ended up with different wording in the patch, but
> both this explanation and what's in the patch now make sense to me.
> Thanks for clarifying.

Yes, I tried to rework it and your suggestion and feedback was very
helpful.

> Also, I noticed that the commit message explains missing rows for some
> backend_type / io_context combinations and NULL (versus 0) in some
> cells, but the docs don't really talk about that. Do you think that
> should be in there as well?

Thanks for pointing this out. I have added notes about this to the
relevant columns in the docs.

- Melanie

[1] https://www.postgresql.org/message-id/20221026185808.4qnxowtn35x43u7u%40awork3.anarazel.de

Attachment Content-Type Size
v37-0002-Track-IO-operation-statistics-locally.patch application/octet-stream 30.8 KB
v37-0001-Remove-BufferAccessStrategyData-current_was_in_r.patch application/octet-stream 3.8 KB
v37-0004-Add-system-view-tracking-IO-ops-per-backend-type.patch application/octet-stream 52.7 KB
v37-0003-Aggregate-IO-operation-stats-per-BackendType.patch application/octet-stream 22.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhihong Yu 2022-11-03 17:04:44 remove unnecessary assignment to tmask in DecodeDateTime
Previous Message Jérémie Grauer 2022-11-03 15:54:13 new option to allow pg_rewind to run without full_page_writes