Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Lukas Fittl <lukas(at)fittl(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>
Subject: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Date: 2023-01-11 21:58:51
Message-ID: 20230111215850.GO9837@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Subject: [PATCH v45 4/5] Add system view tracking IO ops per backend type

The patch can/will fail with:

CREATE TABLESPACE test_io_shared_stats_tblspc LOCATION '';
+WARNING: tablespaces created by regression test cases should have names starting with "regress_"

CREATE TABLESPACE test_stats LOCATION '';
+WARNING: tablespaces created by regression test cases should have names starting with "regress_"

(I already sent patches to address the omission in cirrus.yml)

1760 : errhint("Target must be \"archiver\", \"io\", \"bgwriter\", \"recovery_prefetch\", or \"wal\".")));
=> Do you want to put these in order?

pgstat_get_io_op_name() isn't currently being hit by tests; actually,
it's completely unused.

FlushRelationBuffers() isn't being hit for local buffers.

> + <entry><structname>pg_stat_io</structname><indexterm><primary>pg_stat_io</primary></indexterm></entry>
> + <entry>
> + One row per backend type, context, target object combination showing
> + cluster-wide I/O statistics.

I suggest: "One row for each combination of of .."

> + The <structname>pg_stat_io</structname> and
> + <structname>pg_statio_</structname> set of views are especially useful for
> + determining the effectiveness of the buffer cache. When the number of actual
> + disk reads is much smaller than the number of buffer hits, then the cache is
> + satisfying most read requests without invoking a kernel call.

I would change this say "Postgres' own buffer cache is satisfying ..."

> However, these
> + statistics do not give the entire story: due to the way in which
> + <productname>PostgreSQL</productname> handles disk I/O, data that is not in
> + the <productname>PostgreSQL</productname> buffer cache might still reside in
> + the kernel's I/O cache, and might therefore still be fetched without

I suggest to refer to "the kernel's page cache"

> + The <structname>pg_stat_io</structname> view will contain one row for each
> + backend type, I/O context, and target I/O object combination showing
> + cluster-wide I/O statistics. Combinations which do not make sense are
> + omitted.

"..for each combination of .."

> + <varname>io_context</varname> for a type of I/O operation. For

"for I/O operations"

> + <literal>vacuum</literal>: I/O operations done outside of shared
> + buffers incurred while vacuuming and analyzing permanent relations.

s/incurred/performed/

> + <literal>bulkread</literal>: Qualifying large read I/O operations
> + done outside of shared buffers, for example, a sequential scan of a
> + large table.

I don't think it's correct to say that it's "outside of" shared-buffers.
s/Qualifying/Certain/

> + <literal>bulkwrite</literal>: Qualifying large write I/O operations
> + done outside of shared buffers, such as <command>COPY</command>.

Same

> + Target object of an I/O operation. Possible values are:
> + <itemizedlist>
> + <listitem>
> + <para>
> + <literal>relation</literal>: This includes permanent relations.

It says "includes permanent" but what seems to mean is that it
"exclusive of temporary relations".

> + <row>
> + <entry role="catalog_table_entry">
> + <para role="column_definition">
> + <structfield>read</structfield> <type>bigint</type>
> + </para>
> + <para>
> + Number of read operations in units of <varname>op_bytes</varname>.

This looks too much like it means "bytes".
Should say: "in number of blocks of size >op_bytes<"

But wait - is it the number of read operations "in units of op_bytes"
(which would means this already multiplied by op_bytes, and is in units
of bytes).

Or the "number of read operations" *of* op_bytes chunks ? Which would
mean this is a "pure" number, and could be multipled by op_bytes to
obtain a size in bytes.

> + Number of write operations in units of <varname>op_bytes</varname>.

> + Number of relation extend operations in units of
> + <varname>op_bytes</varname>.

same

> + In <varname>io_context</varname> <literal>normal</literal>, this counts
> + the number of times a block was evicted from a buffer and replaced with
> + another block. In <varname>io_context</varname>s
> + <literal>bulkwrite</literal>, <literal>bulkread</literal>, and
> + <literal>vacuum</literal>, this counts the number of times a block was
> + evicted from shared buffers in order to add the shared buffer to a
> + separate size-limited ring buffer.

This never defines what "evicted" means. Does it mea that a dirty
buffer was written out ?

> + The number of times an existing buffer in a size-limited ring buffer
> + outside of shared buffers was reused as part of an I/O operation in the
> + <literal>bulkread</literal>, <literal>bulkwrite</literal>, or
> + <literal>vacuum</literal> <varname>io_context</varname>s.

Maybe say "as part of a bulk I/O operation (bulkread, bulkwrite, or
vacuum)."

> + <para>
> + <structname>pg_stat_io</structname> can be used to inform database tuning.

> + For example:
> + <itemizedlist>
> + <listitem>
> + <para>
> + A high <varname>evicted</varname> count can indicate that shared buffers
> + should be increased.
> + </para>
> + </listitem>
> + <listitem>
> + <para>
> + Client backends rely on the checkpointer to ensure data is persisted to
> + permanent storage. Large numbers of <varname>files_synced</varname> by
> + <literal>client backend</literal>s could indicate a misconfiguration of
> + shared buffers or of checkpointer. More information on checkpointer

of *the* checkpointer

> + Normally, client backends should be able to rely on auxiliary processes
> + like the checkpointer and background writer to write out dirty data as

*the* bg writer

> + much as possible. Large numbers of writes by client backends could
> + indicate a misconfiguration of shared buffers or of checkpointer. More

*the* ckpointer

Should this link to various docs for checkpointer/bgwriter ?

Maybe the docs for ALTER/COPY/VACUUM/CREATE/etc should be updated to
refer to some central description of ring buffers. Maybe something
should be included to the appendix.

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-01-11 22:10:07 Re: pgsql: Doc: add XML ID attributes to <sectN> and <varlistentry> tags.
Previous Message Magnus Hagander 2023-01-11 21:58:47 Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert