Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add statistics to pg_stat_wal view for wal related parameter tuning
Date: 2020-12-02 04:52:43
Message-ID: 451365b7-9e22-211e-428f-fc90ebc715bc@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020/12/01 14:01, Fujii Masao wrote:
>
>
> On 2020/11/26 16:07, Masahiro Ikeda wrote:
>> On 2020-11-25 20:19, Fujii Masao wrote:
>>> On 2020/11/19 16:31, Masahiro Ikeda wrote:
>>>> On 2020-11-17 11:46, Fujii Masao wrote:
>>>>> On 2020/11/16 16:35, Masahiro Ikeda wrote:
>>>>>> On 2020-11-12 14:58, Fujii Masao wrote:
>>>>>>> On 2020/11/06 10:25, Masahiro Ikeda wrote:
>>>>>>>> On 2020-10-30 11:50, Fujii Masao wrote:
>>>>>>>>> On 2020/10/29 17:03, Masahiro Ikeda wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Thanks for your comments and advice. I updated the patch.
>>>>>>>>>>
>>>>>>>>>> On 2020-10-21 18:03, Kyotaro Horiguchi wrote:
>>>>>>>>>>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda
>>>>>>>>>>> <ikedamsh(at)oss(dot)nttdata(dot)com> wrote in
>>>>>>>>>>>> On 2020-10-20 12:46, Amit Kapila wrote:
>>>>>>>>>>>> > I see that we also need to add extra code to capture these stats (some
>>>>>>>>>>>> > of which is in performance-critical path especially in
>>>>>>>>>>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might
>>>>>>>>>>>> > be that it is all fine as it is very important to collect these stats
>>>>>>>>>>>> > at cluster-level in spite that the same information can be gathered at
>>>>>>>>>>>> > statement-level to help customers but I don't see a very strong case
>>>>>>>>>>>> > for that in your proposal.
>>>>>>>>>>>
>>>>>>>>>>> We should avoid that duplication as possible even if the both number
>>>>>>>>>>> are important.
>>>>>>>>>>>
>>>>>>>>>>>> Also about performance, I thought there are few impacts because it
>>>>>>>>>>>> increments stats in memory. If I can implement to reuse pgWalUsage's
>>>>>>>>>>>> value which already collects these stats, there is no impact in
>>>>>>>>>>>> XLogInsertRecord.
>>>>>>>>>>>> For example, how about pg_stat_wal() calculates the accumulated
>>>>>>>>>>>> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's
>>>>>>>>>>>> value?
>>>>>>>>>>>
>>>>>>>>>>> I don't think that works, but it would work that pgstat_send_wal()
>>>>>>>>>>> takes the difference of that values between two successive calls.
>>>>>>>>>>>
>>>>>>>>>>> WalUsage prevWalUsage;
>>>>>>>>>>> ...
>>>>>>>>>>> pgstat_send_wal()
>>>>>>>>>>> {
>>>>>>>>>>> ..
>>>>>>>>>>>    /* fill in some values using pgWalUsage */
>>>>>>>>>>>    WalStats.m_wal_bytes   = pgWalUsage.wal_bytes   - prevWalUsage.wal_bytes;
>>>>>>>>>>>    WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records;
>>>>>>>>>>>    WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi     - prevWalUsage.wal_fpi;
>>>>>>>>>>> ...
>>>>>>>>>>>    pgstat_send(&WalStats, sizeof(WalStats));
>>>>>>>>>>>
>>>>>>>>>>>    /* remember the current numbers */
>>>>>>>>>>>    prevWalUsage = pgWalUsage;
>>>>>>>>>>
>>>>>>>>>> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage
>>>>>>>>>> which is already defined and eliminates the extra overhead.
>>>>>>>>>
>>>>>>>>> +    /* fill in some values using pgWalUsage */
>>>>>>>>> +    WalStats.m_wal_bytes = pgWalUsage.wal_bytes - prevWalUsage.wal_bytes;
>>>>>>>>> +    WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records;
>>>>>>>>> +    WalStats.m_wal_fpi = pgWalUsage.wal_fpi - prevWalUsage.wal_fpi;
>>>>>>>>>
>>>>>>>>> It's better to use WalUsageAccumDiff() here?
>>>>>>>>
>>>>>>>> Yes, thanks. I fixed it.
>>>>>>>>
>>>>>>>>> prevWalUsage needs to be initialized with pgWalUsage?
>>>>>>>>>
>>>>>>>>> +                if (AmWalWriterProcess()){
>>>>>>>>> +                    WalStats.m_wal_write_walwriter++;
>>>>>>>>> +                }
>>>>>>>>> +                else
>>>>>>>>> +                {
>>>>>>>>> +                    WalStats.m_wal_write_backend++;
>>>>>>>>> +                }
>>>>>>>>>
>>>>>>>>> I think that it's better not to separate m_wal_write_xxx into two for
>>>>>>>>> walwriter and other processes. Instead, we can use one m_wal_write_xxx
>>>>>>>>> counter and make pgstat_send_wal() send also the process type to
>>>>>>>>> the stats collector. Then the stats collector can accumulate the counters
>>>>>>>>> per process type if necessary. If we adopt this approach, we can easily
>>>>>>>>> extend pg_stat_wal so that any fields can be reported per process type.
>>>>>>>>
>>>>>>>> I'll remove the above source code because these counters are not useful.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2020-10-30 12:00, Fujii Masao wrote:
>>>>>>>>> On 2020/10/20 11:31, Masahiro Ikeda wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I think we need to add some statistics to pg_stat_wal view.
>>>>>>>>>>
>>>>>>>>>> Although there are some parameter related WAL,
>>>>>>>>>> there are few statistics for tuning them.
>>>>>>>>>>
>>>>>>>>>> I think it's better to provide the following statistics.
>>>>>>>>>> Please let me know your comments.
>>>>>>>>>>
>>>>>>>>>> ```
>>>>>>>>>> postgres=# SELECT * from pg_stat_wal;
>>>>>>>>>> -[ RECORD 1 ]-------+------------------------------
>>>>>>>>>> wal_records         | 2000224
>>>>>>>>>> wal_fpi             | 47
>>>>>>>>>> wal_bytes           | 248216337
>>>>>>>>>> wal_buffers_full    | 20954
>>>>>>>>>> wal_init_file       | 8
>>>>>>>>>> wal_write_backend   | 20960
>>>>>>>>>> wal_write_walwriter | 46
>>>>>>>>>> wal_write_time      | 51
>>>>>>>>>> wal_sync_backend    | 7
>>>>>>>>>> wal_sync_walwriter  | 8
>>>>>>>>>> wal_sync_time       | 0
>>>>>>>>>> stats_reset         | 2020-10-20 11:04:51.307771+09
>>>>>>>>>> ```
>>>>>>>>>>
>>>>>>>>>> 1. Basic statistics of WAL activity
>>>>>>>>>>
>>>>>>>>>> - wal_records: Total number of WAL records generated
>>>>>>>>>> - wal_fpi: Total number of WAL full page images generated
>>>>>>>>>> - wal_bytes: Total amount of WAL bytes generated
>>>>>>>>>>
>>>>>>>>>> To understand DB's performance, first, we will check the performance
>>>>>>>>>> trends for the entire database instance.
>>>>>>>>>> For example, if the number of wal_fpi becomes higher, users may tune
>>>>>>>>>> "wal_compression", "checkpoint_timeout" and so on.
>>>>>>>>>>
>>>>>>>>>> Although users can check the above statistics via EXPLAIN, auto_explain,
>>>>>>>>>> autovacuum and pg_stat_statements now,
>>>>>>>>>> if users want to see the performance trends  for the entire database,
>>>>>>>>>> they must recalculate the statistics.
>>>>>>>>>>
>>>>>>>>>> I think it is useful to add the sum of the basic statistics.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2.  WAL segment file creation
>>>>>>>>>>
>>>>>>>>>> - wal_init_file: Total number of WAL segment files created.
>>>>>>>>>>
>>>>>>>>>> To create a new WAL file may have an impact on the performance of
>>>>>>>>>> a write-heavy workload generating lots of WAL. If this number is reported high,
>>>>>>>>>> to reduce the number of this initialization, we can tune WAL-related parameters
>>>>>>>>>> so that more "recycled" WAL files can be held.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 3. Number of when WAL is flushed
>>>>>>>>>>
>>>>>>>>>> - wal_write_backend : Total number of WAL data written to the disk by backends
>>>>>>>>>> - wal_write_walwriter : Total number of WAL data written to the disk by walwriter
>>>>>>>>>> - wal_sync_backend : Total number of WAL data synced to the disk by backends
>>>>>>>>>> - wal_sync_walwriter : Total number of WAL data synced to the disk by walwrite
>>>>>>>>>>
>>>>>>>>>> I think it's useful for tuning "synchronous_commit" and "commit_delay" for query executions.
>>>>>>>>>> If the number of WAL is flushed is high, users can know "synchronous_commit" is useful for the workload.
>>>>>>>>>
>>>>>>>>> I just wonder how useful these counters are. Even without these counters,
>>>>>>>>> we already know synchronous_commit=off is likely to cause the better
>>>>>>>>> performance (but has the risk of data loss). So ISTM that these counters are
>>>>>>>>> not so useful when tuning synchronous_commit.
>>>>>>>>
>>>>>>>> Thanks, my understanding was wrong.
>>>>>>>> I agreed that your comments.
>>>>>>>>
>>>>>>>> I merged the statistics of *_backend and *_walwriter.
>>>>>>>> I think the sum of them is useful to calculate the average per write/sync time.
>>>>>>>> For example, per write time is equals wal_write_time / wal_write.
>>>>>>>
>>>>>>> Understood.
>>>>>>>
>>>>>>> Thanks for updating the patch!
>>>>>>
>>>>>> Thanks for your comments.
>>>>>>
>>>>>>> patching file src/include/catalog/pg_proc.dat
>>>>>>> Hunk #1 FAILED at 5491.
>>>>>>> 1 out of 1 hunk FAILED -- saving rejects to file
>>>>>>> src/include/catalog/pg_proc.dat.rej
>>>>>>>
>>>>>>> I got this failure when applying the patch. Could you update the patch?
>>>>>>
>>>>>> Thanks, I updated the patch.
>>>>>>
>>>>>>> -       Number of times WAL data was written to the disk because WAL
>>>>>>> buffers got full
>>>>>>> +       Total number of times WAL data written to the disk because WAL
>>>>>>> buffers got full
>>>>>>>
>>>>>>> Isn't "was" necessary between "data" and "written"?
>>>>>>
>>>>>> Yes, I fixed it.
>>>>>>
>>>>>>> +      <entry role="catalog_table_entry"><para role="column_definition">
>>>>>>> +       <structfield>wal_bytes</structfield> <type>bigint</type>
>>>>>>>
>>>>>>> Shouldn't the type of wal_bytes be numeric because the total number of
>>>>>>> WAL bytes can exceed the range of bigint? I think that the type of
>>>>>>> pg_stat_statements.wal_bytes is also numeric for the same reason.
>>>>>>
>>>>>> Thanks, I fixed it.
>>>>>>
>>>>>> Since I cast the type of wal_bytes from PgStat_Counter to uint64,
>>>>>> I changed the type of PgStat_MsgWal and PgStat_WalStats too.
>>>>>>
>>>>>>> +      <entry role="catalog_table_entry"><para role="column_definition">
>>>>>>> +       <structfield>wal_write_time</structfield> <type>bigint</type>
>>>>>>>
>>>>>>> Shouldn't the type of wal_xxx_time be double precision,
>>>>>>> like pg_stat_database.blk_write_time?
>>>>>>
>>>>>> Thanks, I changed it.
>>>>>>
>>>>>>> Even when fsync is set to off or wal_sync_method is set to open_sync,
>>>>>>> wal_sync is incremented. Isn't this behavior confusing?
>>>>>
>>>>> What do you think about this comment?
>>>>
>>>> Sorry, I'll change to increment wal_sync and wal_sync_time only
>>>> if a specific fsync method is called.
>>>>
>>>>> I found that we discussed track-WAL-IO-timing feature at the past discussion
>>>>> about the similar feature [1]. But the feature was droppped from the proposal
>>>>> patch because there was the performance concern. So probably we need to
>>>>> revisit the past discussion and benchmark the performance. Thought?
>>>>>
>>>>> If track-WAL-IO-timing feature may cause performance regression,
>>>>> it might be an idea to extract wal_records, wal_fpi and wal_bytes parts
>>>>> from the patch and commit it at first.
>>>>>
>>>>> [1]
>>>>> https://postgr.es/m/CAJrrPGc6APFUGYNcPe4qcNxpL8gXKYv1KST+vwJcFtCSCEySnA@mail.gmail.com
>>>>
>>>> Thanks, I'll check the thread.
>>>> I agree to add basic statistics at first and I attached the patch.
>>>
>>> Thanks!
>>>
>>> +        /* Send WAL statistics */
>>> +        pgstat_send_wal();
>>>
>>> This is not necessary because walwriter generates no WAL data?
>>
>> No, it's not necessary.
>> Thanks. I fixed it.
>
> Thanks for updating the patch! I applied cosmetic changes to it.
> For example, I added more comments. Patch attached.
> Barring no objection, I will commit this patch.

Pushed. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2020-12-02 05:48:02 Re: proposal: unescape_text function
Previous Message David Zhang 2020-12-02 04:52:13 Re: Add table access method as an option to pgbench