Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Lukas Fittl <lukas(at)fittl(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Date: 2022-11-30 01:12:47
Message-ID: CAAKRu_adS9h=BrSpDRwP1DHBAQUHqQ04P07EoLtxg+JOWKEv6Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the review, Maciek!

I've attached a new version 39 of the patch which addresses your docs
feedback from this email as well as docs feedback from Andres in [1] and
Justin in [2].

I've made some additional code changes addressing a few of their other
points as well, and I've moved the verify_heapam test to a plain sql
test in contrib/amcheck instead of putting it in the perl test.

This patchset also includes various cleanup, pgindenting, and addressing
the sgml indentation issue brought up in the thread.

On Mon, Nov 7, 2022 at 1:26 PM Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com> wrote:
>
> On Thu, Nov 3, 2022 at 10:00 AM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
>
> > > I'm reviewing the rendered docs now, and I noticed sentences like this
> > > are a bit hard to scan: they force the reader to parse a big list of
> > > backend types before even getting to the meat of what this is talking
> > > about. Should we maybe reword this so that the backend list comes at
> > > the end of the sentence? Or maybe even use a list (e.g., like in the
> > > "state" column description in pg_stat_activity)?
> >
> > Good idea with the bullet points.
> > For the lengthy lists, I've added bullet point lists to the docs for
> > several of the columns. It is quite long now but, hopefully, clearer?
> > Let me know if you think it improves the readability.
>
> Hmm, I should have tried this before suggesting it. I think the lists
> break up the flow of the column description too much. What do you
> think about the attached (on top of your patches--attaching it as a
> .diff to hopefully not confuse cfbot)? I kept the lists for backend
> types but inlined the others as a middle ground. I also added a few
> omitted periods and reworded "read plus extended" to avoid starting
> the sentence with a (lowercase) varname (I think in general it's fine
> to do that, but the more complicated sentence structure here makes it
> easier to follow if the sentence starts with a capital).
>
> Alternately, what do you think about pulling equivalencies to existing
> views out of the main column descriptions, and adding them after the
> main table as a sort of footnote? Most view docs don't have anything
> like that, but pg_stat_replication does and it might be a good pattern
> to follow.
>
> Thoughts?

Thanks for including a patch!
In the attached v39, I've taken your suggestion of flattening some of
the lists and done some rewording as well. I have also moved the note
about equivalence with pg_stat_statements columns to the
pg_stat_statements documentation. The result is quite a bit different
than what I had before, so I would be interested to hear your thoughts.

My concern with the blue "note" section like you mentioned is that it
would be harder to read the lists of backend types than it was in the
tabular format.

> > > + <varname>io_context</varname>s. When a <quote>Buffer Access
> > > + Strategy</quote> reuses a buffer in the strategy ring, it must evict its
> > > + contents, incrementing <varname>reused</varname>. When a <quote>Buffer
> > > + Access Strategy</quote> adds a new shared buffer to the strategy ring
> > > + and this shared buffer is occupied, the <quote>Buffer Access
> > > + Strategy</quote> must evict the contents of the shared buffer,
> > > + incrementing <varname>evicted</varname>.
> > >
> > > I think the parallel phrasing here makes this a little hard to follow.
> > > Specifically, I think "must evict its contents" for the strategy case
> > > sounds like a bad thing, but in fact this is a totally normal thing
> > > that happens as part of strategy access, no? The idea is you probably
> > > won't need that buffer again, so it's fine to evict it. I'm not sure
> > > how to reword, but I think the current phrasing is misleading.
> >
> > I had trouble rephrasing this. I changed a few words. I see what you
> > mean. It is worth noting that reusing strategy buffers when there are
> > buffers on the freelist may not be the best behavior, so I wouldn't
> > necessarily consider "reused" a good thing. However, I'm not sure how
> > much the user could really do about this. I would at least like this
> > phrasing to be clear (evicted is for shared buffers, reused is for
> > strategy buffers), so, perhaps this section requires more work.
>
> Oh, I see. I think the updated wording works better. Although I think
> we can drop the quotes around "Buffer Access Strategy" here. They're
> useful when defining the term originally, but after that I think it's
> clearer to use the term unquoted.

Thanks! I've fixed this.

> Just to understand this better myself, though: can you clarify when
> "reused" is not a normal, expected part of the strategy execution? I
> was under the impression that a ring buffer is used because each page
> is needed only "once" (i.e., for one set of operations) for the
> command using the strategy ring buffer. Naively, in that situation, it
> seems better to reuse a no-longer-needed buffer than to claim another
> buffer from the freelist (where other commands may eventually make
> better use of it).

You are right: reused is a normal, expected part of strategy
execution. And you are correct: the idea behind reusing existing
strategy buffers instead of taking buffers off the freelist is to leave
those buffers for blocks that we might expect to be accessed more than
once.

In practice, however, if you happen to not be using many shared buffers,
and then do a large COPY, for example, you will end up doing a bunch of
writes (in order to reuse the strategy buffers) that you perhaps didn't
need to do at that time had you leveraged the freelist. I think the
decision about which tradeoff to make is quite contentious, though.

> Some more notes on the docs patch:
>
> + <row>
> + <entry role="catalog_table_entry"><para role="column_definition">
> + <structfield>io_context</structfield> <type>text</type>
> + </para>
> + <para>
> + The context or location of an IO operation.
> + </para>
> + <itemizedlist>
> + <listitem>
> + <para>
> + <varname>io_context</varname> <literal>buffer pool</literal> refers to
> + IO operations on data in both the shared buffer pool and process-local
> + buffer pools used for temporary relation data.
> + </para>
> + <para>
> + Operations on temporary relations are tracked in
> + <varname>io_context</varname> <literal>buffer pool</literal> and
> + <varname>io_object</varname> <literal>temp relation</literal>.
> + </para>
> + <para>
> + Operations on permanent relations are tracked in
> + <varname>io_context</varname> <literal>buffer pool</literal> and
> + <varname>io_object</varname> <literal>relation</literal>.
> + </para>
> + </listitem>
>
> For this column, you repeat "io_context" in the list describing the
> possible values of the column. Enum-style columns in other tables
> don't do that (e.g., the pg_stat_activty "state" column). I think it
> might read better to omit "io_context" from the list.

I changed this.

> + <entry role="catalog_table_entry"><para role="column_definition">
> + <structfield>io_object</structfield> <type>text</type>
> + </para>
> + <para>
> + Object operated on in a given <varname>io_context</varname> by a given
> + <varname>backend_type</varname>.
> + </para>
>
> Is this a fixed set of objects we should list, like for io_context?

I've added this.

- Melanie

[1] https://www.postgresql.org/message-id/20221121003815.qnwlnz2lhkow2e5w%40awork3.anarazel.de
[2] https://www.postgresql.org/message-id/20221123054329.GG11463%40telsasoft.com

Attachment Content-Type Size
v39-0003-Aggregate-IO-operation-stats-per-BackendType.patch application/octet-stream 22.6 KB
v39-0002-Track-IO-operation-statistics-locally.patch application/octet-stream 28.4 KB
v39-0001-Remove-BufferAccessStrategyData-current_was_in_r.patch application/octet-stream 5.4 KB
v39-0004-Add-system-view-tracking-IO-ops-per-backend-type.patch application/octet-stream 59.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-11-30 01:44:34 Re: Strange failure on mamba
Previous Message Michael Paquier 2022-11-30 01:12:00 Re: User functions for building SCRAM secrets