Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add statistics to pg_stat_wal view for wal related parameter tuning
Date: 2020-11-12 05:58:59
Message-ID: e28c5f5e-363e-70c1-a200-28a8798ed8e2@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020/11/06 10:25, Masahiro Ikeda wrote:
> On 2020-10-30 11:50, Fujii Masao wrote:
>> On 2020/10/29 17:03, Masahiro Ikeda wrote:
>>> Hi,
>>>
>>> Thanks for your comments and advice. I updated the patch.
>>>
>>> On 2020-10-21 18:03, Kyotaro Horiguchi wrote:
>>>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda
>>>> <ikedamsh(at)oss(dot)nttdata(dot)com> wrote in
>>>>> On 2020-10-20 12:46, Amit Kapila wrote:
>>>>> > I see that we also need to add extra code to capture these stats (some
>>>>> > of which is in performance-critical path especially in
>>>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might
>>>>> > be that it is all fine as it is very important to collect these stats
>>>>> > at cluster-level in spite that the same information can be gathered at
>>>>> > statement-level to help customers but I don't see a very strong case
>>>>> > for that in your proposal.
>>>>
>>>> We should avoid that duplication as possible even if the both number
>>>> are important.
>>>>
>>>>> Also about performance, I thought there are few impacts because it
>>>>> increments stats in memory. If I can implement to reuse pgWalUsage's
>>>>> value which already collects these stats, there is no impact in
>>>>> XLogInsertRecord.
>>>>> For example, how about pg_stat_wal() calculates the accumulated
>>>>> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's
>>>>> value?
>>>>
>>>> I don't think that works, but it would work that pgstat_send_wal()
>>>> takes the difference of that values between two successive calls.
>>>>
>>>> WalUsage prevWalUsage;
>>>> ...
>>>> pgstat_send_wal()
>>>> {
>>>> ..
>>>>    /* fill in some values using pgWalUsage */
>>>>    WalStats.m_wal_bytes   = pgWalUsage.wal_bytes   - prevWalUsage.wal_bytes;
>>>>    WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records;
>>>>    WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi     - prevWalUsage.wal_fpi;
>>>> ...
>>>>    pgstat_send(&WalStats, sizeof(WalStats));
>>>>
>>>>    /* remember the current numbers */
>>>>    prevWalUsage = pgWalUsage;
>>>
>>> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage
>>> which is already defined and eliminates the extra overhead.
>>
>> +    /* fill in some values using pgWalUsage */
>> +    WalStats.m_wal_bytes = pgWalUsage.wal_bytes - prevWalUsage.wal_bytes;
>> +    WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records;
>> +    WalStats.m_wal_fpi = pgWalUsage.wal_fpi - prevWalUsage.wal_fpi;
>>
>> It's better to use WalUsageAccumDiff() here?
>
> Yes, thanks. I fixed it.
>
>> prevWalUsage needs to be initialized with pgWalUsage?
>>
>> +                if (AmWalWriterProcess()){
>> +                    WalStats.m_wal_write_walwriter++;
>> +                }
>> +                else
>> +                {
>> +                    WalStats.m_wal_write_backend++;
>> +                }
>>
>> I think that it's better not to separate m_wal_write_xxx into two for
>> walwriter and other processes. Instead, we can use one m_wal_write_xxx
>> counter and make pgstat_send_wal() send also the process type to
>> the stats collector. Then the stats collector can accumulate the counters
>> per process type if necessary. If we adopt this approach, we can easily
>> extend pg_stat_wal so that any fields can be reported per process type.
>
> I'll remove the above source code because these counters are not useful.
>
>
> On 2020-10-30 12:00, Fujii Masao wrote:
>> On 2020/10/20 11:31, Masahiro Ikeda wrote:
>>> Hi,
>>>
>>> I think we need to add some statistics to pg_stat_wal view.
>>>
>>> Although there are some parameter related WAL,
>>> there are few statistics for tuning them.
>>>
>>> I think it's better to provide the following statistics.
>>> Please let me know your comments.
>>>
>>> ```
>>> postgres=# SELECT * from pg_stat_wal;
>>> -[ RECORD 1 ]-------+------------------------------
>>> wal_records         | 2000224
>>> wal_fpi             | 47
>>> wal_bytes           | 248216337
>>> wal_buffers_full    | 20954
>>> wal_init_file       | 8
>>> wal_write_backend   | 20960
>>> wal_write_walwriter | 46
>>> wal_write_time      | 51
>>> wal_sync_backend    | 7
>>> wal_sync_walwriter  | 8
>>> wal_sync_time       | 0
>>> stats_reset         | 2020-10-20 11:04:51.307771+09
>>> ```
>>>
>>> 1. Basic statistics of WAL activity
>>>
>>> - wal_records: Total number of WAL records generated
>>> - wal_fpi: Total number of WAL full page images generated
>>> - wal_bytes: Total amount of WAL bytes generated
>>>
>>> To understand DB's performance, first, we will check the performance
>>> trends for the entire database instance.
>>> For example, if the number of wal_fpi becomes higher, users may tune
>>> "wal_compression", "checkpoint_timeout" and so on.
>>>
>>> Although users can check the above statistics via EXPLAIN, auto_explain,
>>> autovacuum and pg_stat_statements now,
>>> if users want to see the performance trends  for the entire database,
>>> they must recalculate the statistics.
>>>
>>> I think it is useful to add the sum of the basic statistics.
>>>
>>>
>>> 2.  WAL segment file creation
>>>
>>> - wal_init_file: Total number of WAL segment files created.
>>>
>>> To create a new WAL file may have an impact on the performance of
>>> a write-heavy workload generating lots of WAL. If this number is reported high,
>>> to reduce the number of this initialization, we can tune WAL-related parameters
>>> so that more "recycled" WAL files can be held.
>>>
>>>
>>>
>>> 3. Number of when WAL is flushed
>>>
>>> - wal_write_backend : Total number of WAL data written to the disk by backends
>>> - wal_write_walwriter : Total number of WAL data written to the disk by walwriter
>>> - wal_sync_backend : Total number of WAL data synced to the disk by backends
>>> - wal_sync_walwriter : Total number of WAL data synced to the disk by walwrite
>>>
>>> I think it's useful for tuning "synchronous_commit" and "commit_delay" for query executions.
>>> If the number of WAL is flushed is high, users can know "synchronous_commit" is useful for the workload.
>>
>> I just wonder how useful these counters are. Even without these counters,
>> we already know synchronous_commit=off is likely to cause the better
>> performance (but has the risk of data loss). So ISTM that these counters are
>> not so useful when tuning synchronous_commit.
>
> Thanks, my understanding was wrong.
> I agreed that your comments.
>
> I merged the statistics of *_backend and *_walwriter.
> I think the sum of them is useful to calculate the average per write/sync time.
> For example, per write time is equals wal_write_time / wal_write.

Understood.

Thanks for updating the patch!

patching file src/include/catalog/pg_proc.dat
Hunk #1 FAILED at 5491.
1 out of 1 hunk FAILED -- saving rejects to file src/include/catalog/pg_proc.dat.rej

I got this failure when applying the patch. Could you update the patch?

- Number of times WAL data was written to the disk because WAL buffers got full
+ Total number of times WAL data written to the disk because WAL buffers got full

Isn't "was" necessary between "data" and "written"?

+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>wal_bytes</structfield> <type>bigint</type>

Shouldn't the type of wal_bytes be numeric because the total number of
WAL bytes can exceed the range of bigint? I think that the type of
pg_stat_statements.wal_bytes is also numeric for the same reason.

+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>wal_write_time</structfield> <type>bigint</type>

Shouldn't the type of wal_xxx_time be double precision,
like pg_stat_database.blk_write_time?

Even when fsync is set to off or wal_sync_method is set to open_sync,
wal_sync is incremented. Isn't this behavior confusing?

+ Total amount of time that has been spent in the portion of
+ WAL data was written to disk by backend and walwriter, in milliseconds
+ (if <xref linkend="guc-track-io-timing"/> is enabled, otherwise zero)

With the patch, track_io_timing controls both IO for data files and
WAL files. But we may want to track only either of them. So it's better
to extend track_io_timing so that we can specify the tracking target
in the parameter? For example, we can make track_io_timing accept
data, wal and all. Or we should introduce new GUC for WAL, e.g.,
track_wal_io_timing? Thought?

I'm afraid that "by backend and walwriter" part can make us thinkg
incorrectly that WAL writes by other processes like autovacuum
are not tracked.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2020-11-12 05:59:39 Re: logical streaming of xacts via test_decoding is broken
Previous Message David G. Johnston 2020-11-12 05:52:50 Re: Proposition for autoname columns