Quick Links

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

From:	Melanie Plageman <melanieplageman(at)gmail(dot)com>
To:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Lukas Fittl <lukas(at)fittl(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject:	Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Date:	2023-01-13 02:19:36
Message-ID:	CAAKRu_bTjvHSd5tWWQkW8SWY-5frVp_vOJpFD+-V7jJG1Rf_Dw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Attached is v46.

On Wed, Dec 28, 2022 at 6:56 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2022-10-06 13:42:09 -0400, Melanie Plageman wrote:
> > > Additionally, some minor notes:
> > >
> > > - Since the stats are counting blocks, it would make sense to prefix the view columns with "blks_", and word them in the past tense (to match current style), i.e. "blks_written", "blks_read", "blks_extended", "blks_fsynced" (realistically one would combine this new view with other data e.g. from pg_stat_database or pg_stat_statements, which all use the "blks_" prefix, and stop using pg_stat_bgwriter for this which does not use such a prefix)
> >
> > I have changed the column names to be in the past tense.
>
> For a while I was convinced by the consistency argument (after Melanie
> pointing it out to me). But the more I look, the less convinced I am. The
> existing IO related stats in pg_stat_database, pg_stat_bgwriter aren't past
> tense, just the ones in pg_stat_statements. pg_stat_database uses past tense
> for tup_*, but not xact_*, deadlocks, checksum_failures etc.
>
> And even pg_stat_statements isn't consistent about it - otherwise it'd be
> 'planned' instead of 'plans', 'called' instead of 'calls' etc.
>
> I started to look at the naming "tense" issue again, after I got "confused"
> about "extended", because that somehow makes me think about more detailed
> stats or such, rather than files getting extended.
>
> ISTM that 'evictions', 'extends', 'fsyncs', 'reads', 'reuses', 'writes' are
> clearer than the past tense versions, and about as consistent with existing
> columns.

I have updated the column names to the above recommendation.

On Wed, Jan 11, 2023 at 11:32 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> For some reason cfbot is not able to apply this patch as in [1],
> please have a look and post an updated patch if required:
> === Applying patches on top of PostgreSQL commit ID
> 3c6fc58209f24b959ee18f5d19ef96403d08f15c ===
> === applying patch
> ./v45-0001-pgindent-and-some-manual-cleanup-in-pgstat-relat.patch
> patching file src/backend/storage/buffer/bufmgr.c
> patching file src/backend/storage/buffer/localbuf.c
> patching file src/backend/utils/activity/pgstat.c
> patching file src/backend/utils/activity/pgstat_relation.c
> patching file src/backend/utils/adt/pgstatfuncs.c
> patching file src/include/pgstat.h
> patching file src/include/utils/pgstat_internal.h
> === applying patch ./v45-0002-pgstat-Infrastructure-to-track-IO-operations.patch
> gpatch: **** Only garbage was found in the patch input.
>
> [1] - http://cfbot.cputube.org/patch_41_3272.log
>

This was an issue with cfbot that Thomas has now fixed as he describes
in [1].

On Wed, Jan 11, 2023 at 4:58 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>
> > Subject: [PATCH v45 4/5] Add system view tracking IO ops per backend type
>
> The patch can/will fail with:
>
> CREATE TABLESPACE test_io_shared_stats_tblspc LOCATION '';
> +WARNING: tablespaces created by regression test cases should have names starting with "regress_"
>
> CREATE TABLESPACE test_stats LOCATION '';
> +WARNING: tablespaces created by regression test cases should have names starting with "regress_"
>
> (I already sent patches to address the omission in cirrus.yml)

Thanks. I've fixed this
I make a tablespace in amcheck -- are there recommendations for naming
tablespaces in contrib also?

>
> 1760 : errhint("Target must be \"archiver\", \"io\", \"bgwriter\", \"recovery_prefetch\", or \"wal\".")));
> => Do you want to put these in order?

Thanks. Fixed.

> pgstat_get_io_op_name() isn't currently being hit by tests; actually,
> it's completely unused.

Deleted it.

> FlushRelationBuffers() isn't being hit for local buffers.

I added a test.

> > + <entry><structname>pg_stat_io</structname><indexterm><primary>pg_stat_io</primary></indexterm></entry>
> > + <entry>
> > + One row per backend type, context, target object combination showing
> > + cluster-wide I/O statistics.
>
> I suggest: "One row for each combination of of .."

I have made this change.

> > + The <structname>pg_stat_io</structname> and
> > + <structname>pg_statio_</structname> set of views are especially useful for
> > + determining the effectiveness of the buffer cache. When the number of actual
> > + disk reads is much smaller than the number of buffer hits, then the cache is
> > + satisfying most read requests without invoking a kernel call.
>
> I would change this say "Postgres' own buffer cache is satisfying ..."

So, this is existing copy to which I added the pg_stat_io view name and
re-flowed the indentation.
However, I think your suggestions are a good idea, so I've taken them
and just rewritten this paragraph altogether.

>
> > However, these
> > + statistics do not give the entire story: due to the way in which
> > + <productname>PostgreSQL</productname> handles disk I/O, data that is not in
> > + the <productname>PostgreSQL</productname> buffer cache might still reside in
> > + the kernel's I/O cache, and might therefore still be fetched without
>
> I suggest to refer to "the kernel's page cache"

same applies here.

>
> > + The <structname>pg_stat_io</structname> view will contain one row for each
> > + backend type, I/O context, and target I/O object combination showing
> > + cluster-wide I/O statistics. Combinations which do not make sense are
> > + omitted.
>
> "..for each combination of .."

I have changed this.

>
> > + <varname>io_context</varname> for a type of I/O operation. For
>
> "for I/O operations"

So I actually mean for a type of I/O operation -- that is, relation data
is normally written to a shared buffer but sometimes we bypass shared
buffers and just call write and sometimes we use a buffer access
strategy and write it to a special ring buffer (made up of buffers
stolen from shared buffers, but still). So I don't want to say "for I/O
operations" because I think that would imply that writes of relation
data will always be in the same IO Context.

>
> > + <literal>vacuum</literal>: I/O operations done outside of shared
> > + buffers incurred while vacuuming and analyzing permanent relations.
>
> s/incurred/performed/

I changed this.

>
> > + <literal>bulkread</literal>: Qualifying large read I/O operations
> > + done outside of shared buffers, for example, a sequential scan of a
> > + large table.
>
> I don't think it's correct to say that it's "outside of" shared-buffers.

I suppose "outside of" gives the wrong idea. But I need to make clear
that this I/O is to and from buffers which are not a part of shared
buffers right now -- they may still be accessible from the same data
structures which access shared buffers but they are currently being used
in a different way.

> s/Qualifying/Certain/

I feel like qualifying is more specific than certain, but I would be open
to changing it if there was a specific reason you don't like it.

>
> > + <literal>bulkwrite</literal>: Qualifying large write I/O operations
> > + done outside of shared buffers, such as <command>COPY</command>.
>
> Same
>
> > + Target object of an I/O operation. Possible values are:
> > + <itemizedlist>
> > + <listitem>
> > + <para>
> > + <literal>relation</literal>: This includes permanent relations.
>
> It says "includes permanent" but what seems to mean is that it
> "exclusive of temporary relations".

I've changed this.

>
> > + <row>
> > + <entry role="catalog_table_entry">
> > + <para role="column_definition">
> > + <structfield>read</structfield> <type>bigint</type>
> > + </para>
> > + <para>
> > + Number of read operations in units of <varname>op_bytes</varname>.
>
> This looks too much like it means "bytes".
> Should say: "in number of blocks of size >op_bytes<"
>
> But wait - is it the number of read operations "in units of op_bytes"
> (which would means this already multiplied by op_bytes, and is in units
> of bytes).
>
> Or the "number of read operations" *of* op_bytes chunks ? Which would
> mean this is a "pure" number, and could be multipled by op_bytes to
> obtain a size in bytes.

It is the number of read operations of op_bytes size -- thanks so much
for pointing this out. The wording was really unclear.
The idea is that you can do something like:
SELECT pg_size_pretty(reads * op_bytes) FROM pg_stat_io;
and get it in bytes.

The view will contain other types of IO that are not in BLCKSZ chunks,
which is where this column will be handy.

>
> > + Number of write operations in units of <varname>op_bytes</varname>.
>
> > + Number of relation extend operations in units of
> > + <varname>op_bytes</varname>.
>
> same
>
> > + In <varname>io_context</varname> <literal>normal</literal>, this counts
> > + the number of times a block was evicted from a buffer and replaced with
> > + another block. In <varname>io_context</varname>s
> > + <literal>bulkwrite</literal>, <literal>bulkread</literal>, and
> > + <literal>vacuum</literal>, this counts the number of times a block was
> > + evicted from shared buffers in order to add the shared buffer to a
> > + separate size-limited ring buffer.
>
> This never defines what "evicted" means. Does it mea that a dirty
> buffer was written out ?

Thanks. I've updated this.

>
> > + The number of times an existing buffer in a size-limited ring buffer
> > + outside of shared buffers was reused as part of an I/O operation in the
> > + <literal>bulkread</literal>, <literal>bulkwrite</literal>, or
> > + <literal>vacuum</literal> <varname>io_context</varname>s.
>
> Maybe say "as part of a bulk I/O operation (bulkread, bulkwrite, or
> vacuum)."

I've changed this.

>
> > + <para>
> > + <structname>pg_stat_io</structname> can be used to inform database tuning.
>
> > + For example:
> > + <itemizedlist>
> > + <listitem>
> > + <para>
> > + A high <varname>evicted</varname> count can indicate that shared buffers
> > + should be increased.
> > + </para>
> > + </listitem>
> > + <listitem>
> > + <para>
> > + Client backends rely on the checkpointer to ensure data is persisted to
> > + permanent storage. Large numbers of <varname>files_synced</varname> by
> > + <literal>client backend</literal>s could indicate a misconfiguration of
> > + shared buffers or of checkpointer. More information on checkpointer
>
> of *the* checkpointer
>
> > + Normally, client backends should be able to rely on auxiliary processes
> > + like the checkpointer and background writer to write out dirty data as
>
> *the* bg writer
>
> > + much as possible. Large numbers of writes by client backends could
> > + indicate a misconfiguration of shared buffers or of checkpointer. More
>
> *the* ckpointer

I've made most of these changes.

> Should this link to various docs for checkpointer/bgwriter ?

I couldn't find docs related to tuning checkpointer outside of the WAL
configuration docs. There is the docs page for the CHECKPOINT command --
but I don't think that is very relevant here.

> Maybe the docs for ALTER/COPY/VACUUM/CREATE/etc should be updated to
> refer to some central description of ring buffers. Maybe something
> should be included to the appendix.

I agree it would be nice to explain Buffer Access Strategies in the docs.

- Melanie

[1] https://www.postgresql.org/message-id/CA%2BhUKGLiY1e%2B1%3DpB7hXJOyGj1dJOfgde%2BHmiSnv3gDKayUFJMA%40mail.gmail.com

Attachment	Content-Type	Size
v46-0001-pgindent-and-some-manual-cleanup-in-pgstat-relat.patch	text/x-patch	6.0 KB
v46-0005-pg_stat_io-documentation.patch	text/x-patch	13.4 KB
v46-0004-Add-system-view-tracking-IO-ops-per-backend-type.patch	text/x-patch	33.9 KB
v46-0003-pgstat-Count-IO-for-relations.patch	text/x-patch	22.1 KB
v46-0002-pgstat-Infrastructure-to-track-IO-operations.patch	text/x-patch	27.7 KB

In response to

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) at 2023-01-11 21:58:51 from Justin Pryzby

Responses

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) at 2023-01-13 05:23:05 from Justin Pryzby

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Smith	2023-01-13 02:25:43	Re: Perform streaming logical transactions by background workers and parallel apply
Previous Message	David Rowley	2023-01-13 02:18:20	Re: Todo: Teach planner to evaluate multiple windows in the optimal order