Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add statistics to pg_stat_wal view for wal related parameter tuning
Date: 2020-10-20 03:46:55
Message-ID: CAA4eK1JbbP-QmZWiXwK+H-ZauJCGR6vbAVoGdyx_TTxRPuXzhQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 20, 2020 at 8:01 AM Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com> wrote:
>
> Hi,
>
> I think we need to add some statistics to pg_stat_wal view.
>
> Although there are some parameter related WAL,
> there are few statistics for tuning them.
>
> I think it's better to provide the following statistics.
> Please let me know your comments.
>
> ```
> postgres=# SELECT * from pg_stat_wal;
> -[ RECORD 1 ]-------+------------------------------
> wal_records | 2000224
> wal_fpi | 47
> wal_bytes | 248216337
> wal_buffers_full | 20954
> wal_init_file | 8
> wal_write_backend | 20960
> wal_write_walwriter | 46
> wal_write_time | 51
> wal_sync_backend | 7
> wal_sync_walwriter | 8
> wal_sync_time | 0
> stats_reset | 2020-10-20 11:04:51.307771+09
> ```
>
> 1. Basic statistics of WAL activity
>
> - wal_records: Total number of WAL records generated
> - wal_fpi: Total number of WAL full page images generated
> - wal_bytes: Total amount of WAL bytes generated
>
> To understand DB's performance, first, we will check the performance
> trends for the entire database instance.
> For example, if the number of wal_fpi becomes higher, users may tune
> "wal_compression", "checkpoint_timeout" and so on.
>
> Although users can check the above statistics via EXPLAIN, auto_explain,
> autovacuum and pg_stat_statements now,
> if users want to see the performance trends for the entire database,
> they must recalculate the statistics.
>

Here, do you mean to say 'entire cluster' instead of 'entire database'
because it seems these stats are getting collected for the entire
cluster?

> I think it is useful to add the sum of the basic statistics.
>

There is an argument that it is better to view these stats at the
statement-level so that one can know which statements are causing most
WAL and then try to rate-limit them if required in the application and
anyway they can get the aggregate of all the WAL if they want. We have
added these stats in PG-13, so do we have any evidence that the
already added stats don't provide enough information? I understand
that you are trying to display the accumulated stats here which if
required users/DBA need to compute with the currently provided stats.
OTOH, sometimes adding more ways to do some things causes difficulty
for users to understand and learn.

I see that we also need to add extra code to capture these stats (some
of which is in performance-critical path especially in
XLogInsertRecord) which again makes me a bit uncomfortable. It might
be that it is all fine as it is very important to collect these stats
at cluster-level in spite that the same information can be gathered at
statement-level to help customers but I don't see a very strong case
for that in your proposal.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2020-10-20 04:14:42 Re: [HACKERS] logical decoding of two-phase transactions
Previous Message Kyotaro Horiguchi 2020-10-20 03:07:23 Re: Transactions involving multiple postgres foreign servers, take 2