Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add statistics to pg_stat_wal view for wal related parameter tuning
Date: 2020-11-12 07:27:36
Message-ID: e5676d99-c9f8-8880-b2b3-ba2d7dc27335@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020/11/12 14:58, Fujii Masao wrote:
>
>
> On 2020/11/06 10:25, Masahiro Ikeda wrote:
>> On 2020-10-30 11:50, Fujii Masao wrote:
>>> On 2020/10/29 17:03, Masahiro Ikeda wrote:
>>>> Hi,
>>>>
>>>> Thanks for your comments and advice. I updated the patch.
>>>>
>>>> On 2020-10-21 18:03, Kyotaro Horiguchi wrote:
>>>>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda
>>>>> <ikedamsh(at)oss(dot)nttdata(dot)com> wrote in
>>>>>> On 2020-10-20 12:46, Amit Kapila wrote:
>>>>>> > I see that we also need to add extra code to capture these stats (some
>>>>>> > of which is in performance-critical path especially in
>>>>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might
>>>>>> > be that it is all fine as it is very important to collect these stats
>>>>>> > at cluster-level in spite that the same information can be gathered at
>>>>>> > statement-level to help customers but I don't see a very strong case
>>>>>> > for that in your proposal.
>>>>>
>>>>> We should avoid that duplication as possible even if the both number
>>>>> are important.
>>>>>
>>>>>> Also about performance, I thought there are few impacts because it
>>>>>> increments stats in memory. If I can implement to reuse pgWalUsage's
>>>>>> value which already collects these stats, there is no impact in
>>>>>> XLogInsertRecord.
>>>>>> For example, how about pg_stat_wal() calculates the accumulated
>>>>>> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's
>>>>>> value?
>>>>>
>>>>> I don't think that works, but it would work that pgstat_send_wal()
>>>>> takes the difference of that values between two successive calls.
>>>>>
>>>>> WalUsage prevWalUsage;
>>>>> ...
>>>>> pgstat_send_wal()
>>>>> {
>>>>> ..
>>>>>    /* fill in some values using pgWalUsage */
>>>>>    WalStats.m_wal_bytes   = pgWalUsage.wal_bytes   - prevWalUsage.wal_bytes;
>>>>>    WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records;
>>>>>    WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi     - prevWalUsage.wal_fpi;
>>>>> ...
>>>>>    pgstat_send(&WalStats, sizeof(WalStats));
>>>>>
>>>>>    /* remember the current numbers */
>>>>>    prevWalUsage = pgWalUsage;
>>>>
>>>> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage
>>>> which is already defined and eliminates the extra overhead.
>>>
>>> +    /* fill in some values using pgWalUsage */
>>> +    WalStats.m_wal_bytes = pgWalUsage.wal_bytes - prevWalUsage.wal_bytes;
>>> +    WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records;
>>> +    WalStats.m_wal_fpi = pgWalUsage.wal_fpi - prevWalUsage.wal_fpi;
>>>
>>> It's better to use WalUsageAccumDiff() here?
>>
>> Yes, thanks. I fixed it.
>>
>>> prevWalUsage needs to be initialized with pgWalUsage?
>>>
>>> +                if (AmWalWriterProcess()){
>>> +                    WalStats.m_wal_write_walwriter++;
>>> +                }
>>> +                else
>>> +                {
>>> +                    WalStats.m_wal_write_backend++;
>>> +                }
>>>
>>> I think that it's better not to separate m_wal_write_xxx into two for
>>> walwriter and other processes. Instead, we can use one m_wal_write_xxx
>>> counter and make pgstat_send_wal() send also the process type to
>>> the stats collector. Then the stats collector can accumulate the counters
>>> per process type if necessary. If we adopt this approach, we can easily
>>> extend pg_stat_wal so that any fields can be reported per process type.
>>
>> I'll remove the above source code because these counters are not useful.
>>
>>
>> On 2020-10-30 12:00, Fujii Masao wrote:
>>> On 2020/10/20 11:31, Masahiro Ikeda wrote:
>>>> Hi,
>>>>
>>>> I think we need to add some statistics to pg_stat_wal view.
>>>>
>>>> Although there are some parameter related WAL,
>>>> there are few statistics for tuning them.
>>>>
>>>> I think it's better to provide the following statistics.
>>>> Please let me know your comments.
>>>>
>>>> ```
>>>> postgres=# SELECT * from pg_stat_wal;
>>>> -[ RECORD 1 ]-------+------------------------------
>>>> wal_records         | 2000224
>>>> wal_fpi             | 47
>>>> wal_bytes           | 248216337
>>>> wal_buffers_full    | 20954
>>>> wal_init_file       | 8
>>>> wal_write_backend   | 20960
>>>> wal_write_walwriter | 46
>>>> wal_write_time      | 51
>>>> wal_sync_backend    | 7
>>>> wal_sync_walwriter  | 8
>>>> wal_sync_time       | 0
>>>> stats_reset         | 2020-10-20 11:04:51.307771+09
>>>> ```
>>>>
>>>> 1. Basic statistics of WAL activity
>>>>
>>>> - wal_records: Total number of WAL records generated
>>>> - wal_fpi: Total number of WAL full page images generated
>>>> - wal_bytes: Total amount of WAL bytes generated
>>>>
>>>> To understand DB's performance, first, we will check the performance
>>>> trends for the entire database instance.
>>>> For example, if the number of wal_fpi becomes higher, users may tune
>>>> "wal_compression", "checkpoint_timeout" and so on.
>>>>
>>>> Although users can check the above statistics via EXPLAIN, auto_explain,
>>>> autovacuum and pg_stat_statements now,
>>>> if users want to see the performance trends  for the entire database,
>>>> they must recalculate the statistics.
>>>>
>>>> I think it is useful to add the sum of the basic statistics.
>>>>
>>>>
>>>> 2.  WAL segment file creation
>>>>
>>>> - wal_init_file: Total number of WAL segment files created.
>>>>
>>>> To create a new WAL file may have an impact on the performance of
>>>> a write-heavy workload generating lots of WAL. If this number is reported high,
>>>> to reduce the number of this initialization, we can tune WAL-related parameters
>>>> so that more "recycled" WAL files can be held.
>>>>
>>>>
>>>>
>>>> 3. Number of when WAL is flushed
>>>>
>>>> - wal_write_backend : Total number of WAL data written to the disk by backends
>>>> - wal_write_walwriter : Total number of WAL data written to the disk by walwriter
>>>> - wal_sync_backend : Total number of WAL data synced to the disk by backends
>>>> - wal_sync_walwriter : Total number of WAL data synced to the disk by walwrite
>>>>
>>>> I think it's useful for tuning "synchronous_commit" and "commit_delay" for query executions.
>>>> If the number of WAL is flushed is high, users can know "synchronous_commit" is useful for the workload.
>>>
>>> I just wonder how useful these counters are. Even without these counters,
>>> we already know synchronous_commit=off is likely to cause the better
>>> performance (but has the risk of data loss). So ISTM that these counters are
>>> not so useful when tuning synchronous_commit.
>>
>> Thanks, my understanding was wrong.
>> I agreed that your comments.
>>
>> I merged the statistics of *_backend and *_walwriter.
>> I think the sum of them is useful to calculate the average per write/sync time.
>> For example, per write time is equals wal_write_time / wal_write.
>
> Understood.
>
> Thanks for updating the patch!
>
> patching file src/include/catalog/pg_proc.dat
> Hunk #1 FAILED at 5491.
> 1 out of 1 hunk FAILED -- saving rejects to file src/include/catalog/pg_proc.dat.rej
>
> I got this failure when applying the patch. Could you update the patch?
>
>
> -       Number of times WAL data was written to the disk because WAL buffers got full
> +       Total number of times WAL data written to the disk because WAL buffers got full
>
> Isn't "was" necessary between "data" and "written"?
>
>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>wal_bytes</structfield> <type>bigint</type>
>
> Shouldn't the type of wal_bytes be numeric because the total number of
> WAL bytes can exceed the range of bigint? I think that the type of
> pg_stat_statements.wal_bytes is also numeric for the same reason.
>
>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>wal_write_time</structfield> <type>bigint</type>
>
> Shouldn't the type of wal_xxx_time be double precision,
> like pg_stat_database.blk_write_time?
>
>
> Even when fsync is set to off or wal_sync_method is set to open_sync,
> wal_sync is incremented. Isn't this behavior confusing?
>
>
> +       Total amount of time that has been spent in the portion of
> +       WAL data was written to disk by backend and walwriter, in milliseconds
> +       (if <xref linkend="guc-track-io-timing"/> is enabled, otherwise zero)
>
> With the patch, track_io_timing controls both IO for data files and
> WAL files. But we may want to track only either of them. So it's better
> to extend track_io_timing so that we can specify the tracking target
> in the parameter? For example, we can make track_io_timing accept
> data, wal and all. Or we should introduce new GUC for WAL, e.g.,
> track_wal_io_timing? Thought?
>
> I'm afraid that "by backend and walwriter" part can make us thinkg
> incorrectly that WAL writes by other processes like autovacuum
> are not tracked.

pgstat_send_wal(void)
{
+ /* fill in some values using pgWalUsage */
+ WalUsage walusage;
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &prevWalUsage);

At the first call to pgstat_send_wal(), prevWalUsage has not been set to
the previous value of pgWalUsage. So the calculation result of
WalUsageAccumDiff() can be incorrect. To address this issue,
prevWalUsage should be set to pgWalUsage or both should be initialized
with 0 at the beginning of the process, for example?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2020-11-12 07:45:21 Re: proposal: possibility to read dumped table's name from file
Previous Message Michael Paquier 2020-11-12 07:12:09 Re: abstract Unix-domain sockets