Re: New statistics for tuning WAL buffer size

From: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Subject: Re: New statistics for tuning WAL buffer size
Date: 2020-10-06 06:57:15
Message-ID: 550d4cadbdb3c8b2e0248ff483eac089@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I think it's better to add other WAL statistics to the pg_stat_wal view.
I'm thinking to add the following statistics. Please let me know your
thoughts.

1. Basic wal statistics

* wal_records: Total number of WAL records generated
* wal_fpi: Total number of WAL full page images generated
* wal_bytes: Total amount of WAL bytes generated

To understand DB's performance, first, we will check the performance
trends for the entire database instance.
For example, if the number of wal_fpi becomes higher, users may tune
"wal_compression", "checkpoint_timeout" and so on.

Although users can check the above statistics via EXPLAIN, auto_explain,
autovacuum
and pg_stat_statements now, if users want to see the performance trends
for the entire database,
they must preprocess the statistics.

Is it useful to add the sum of the above statistics to the pg_stat_wal
view?

2. Number of when new WAL file is created and zero-filled.

As Fujii-san already commented, I think it's good for tuning.

> Just idea; it may be worth exposing the number of when new WAL file is
> created and zero-filled. This initialization may have impact on the
> performance of write-heavy workload generating lots of WAL. If this
> number is reported high, to reduce the number of this initialization,
> we can tune WAL-related parameters so that more "recycled" WAL files
> can be hold.

3. Number of when to switch the WAL logfile segment.

This is similar to 2, but this counts the number of when WAL file is
recylcled too.
I think it's useful for tuning "wal_segment_size"
if the number is high relative to the startup time, "wal_segment_size"
must be bigger.

4. Number of when WAL is flushed

I think it's useful for tuning "synchronous_commit" and "commit_delay"
for query executions.
If the number of WAL is flushed is high, users can know
"synchronous_commit" is useful for the workload.

Also, it's useful for tuning "wal_writer_delay" and
"wal_writer_flush_after" for wal writer.
If the number is high, users can change the parameter for performance.

I think it's better to separate this for backends and wal writer.

5. Wait time when WAL is flushed.

This is the accumulated time when wal is flushed.
If the time becomes much higher, users can detect the possibility of
disk failure.

Since users can see how much flash time occupies of the query execution
time,
it may lead to query tuning and so on.

Since there is the above reason, I think it's better to separate this
for backends and wal writer.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-10-06 07:53:33 Is the doc for SHOW somewhat stale?
Previous Message Bharath Rupireddy 2020-10-06 06:44:22 Re: Logical Replication - detail message with names of missing columns