From: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Add statistics to pg_stat_wal view for wal related parameter tuning |
Date: | 2020-12-01 05:01:41 |
Message-ID: | f9197102-4cb4-dfb6-02a4-b8e237c7699f@oss.nttdata.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2020/11/26 16:07, Masahiro Ikeda wrote:
> On 2020-11-25 20:19, Fujii Masao wrote:
>> On 2020/11/19 16:31, Masahiro Ikeda wrote:
>>> On 2020-11-17 11:46, Fujii Masao wrote:
>>>> On 2020/11/16 16:35, Masahiro Ikeda wrote:
>>>>> On 2020-11-12 14:58, Fujii Masao wrote:
>>>>>> On 2020/11/06 10:25, Masahiro Ikeda wrote:
>>>>>>> On 2020-10-30 11:50, Fujii Masao wrote:
>>>>>>>> On 2020/10/29 17:03, Masahiro Ikeda wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Thanks for your comments and advice. I updated the patch.
>>>>>>>>>
>>>>>>>>> On 2020-10-21 18:03, Kyotaro Horiguchi wrote:
>>>>>>>>>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda
>>>>>>>>>> <ikedamsh(at)oss(dot)nttdata(dot)com> wrote in
>>>>>>>>>>> On 2020-10-20 12:46, Amit Kapila wrote:
>>>>>>>>>>> > I see that we also need to add extra code to capture these stats (some
>>>>>>>>>>> > of which is in performance-critical path especially in
>>>>>>>>>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might
>>>>>>>>>>> > be that it is all fine as it is very important to collect these stats
>>>>>>>>>>> > at cluster-level in spite that the same information can be gathered at
>>>>>>>>>>> > statement-level to help customers but I don't see a very strong case
>>>>>>>>>>> > for that in your proposal.
>>>>>>>>>>
>>>>>>>>>> We should avoid that duplication as possible even if the both number
>>>>>>>>>> are important.
>>>>>>>>>>
>>>>>>>>>>> Also about performance, I thought there are few impacts because it
>>>>>>>>>>> increments stats in memory. If I can implement to reuse pgWalUsage's
>>>>>>>>>>> value which already collects these stats, there is no impact in
>>>>>>>>>>> XLogInsertRecord.
>>>>>>>>>>> For example, how about pg_stat_wal() calculates the accumulated
>>>>>>>>>>> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's
>>>>>>>>>>> value?
>>>>>>>>>>
>>>>>>>>>> I don't think that works, but it would work that pgstat_send_wal()
>>>>>>>>>> takes the difference of that values between two successive calls.
>>>>>>>>>>
>>>>>>>>>> WalUsage prevWalUsage;
>>>>>>>>>> ...
>>>>>>>>>> pgstat_send_wal()
>>>>>>>>>> {
>>>>>>>>>> ..
>>>>>>>>>> /* fill in some values using pgWalUsage */
>>>>>>>>>> WalStats.m_wal_bytes = pgWalUsage.wal_bytes - prevWalUsage.wal_bytes;
>>>>>>>>>> WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records;
>>>>>>>>>> WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi - prevWalUsage.wal_fpi;
>>>>>>>>>> ...
>>>>>>>>>> pgstat_send(&WalStats, sizeof(WalStats));
>>>>>>>>>>
>>>>>>>>>> /* remember the current numbers */
>>>>>>>>>> prevWalUsage = pgWalUsage;
>>>>>>>>>
>>>>>>>>> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage
>>>>>>>>> which is already defined and eliminates the extra overhead.
>>>>>>>>
>>>>>>>> + /* fill in some values using pgWalUsage */
>>>>>>>> + WalStats.m_wal_bytes = pgWalUsage.wal_bytes - prevWalUsage.wal_bytes;
>>>>>>>> + WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records;
>>>>>>>> + WalStats.m_wal_fpi = pgWalUsage.wal_fpi - prevWalUsage.wal_fpi;
>>>>>>>>
>>>>>>>> It's better to use WalUsageAccumDiff() here?
>>>>>>>
>>>>>>> Yes, thanks. I fixed it.
>>>>>>>
>>>>>>>> prevWalUsage needs to be initialized with pgWalUsage?
>>>>>>>>
>>>>>>>> + if (AmWalWriterProcess()){
>>>>>>>> + WalStats.m_wal_write_walwriter++;
>>>>>>>> + }
>>>>>>>> + else
>>>>>>>> + {
>>>>>>>> + WalStats.m_wal_write_backend++;
>>>>>>>> + }
>>>>>>>>
>>>>>>>> I think that it's better not to separate m_wal_write_xxx into two for
>>>>>>>> walwriter and other processes. Instead, we can use one m_wal_write_xxx
>>>>>>>> counter and make pgstat_send_wal() send also the process type to
>>>>>>>> the stats collector. Then the stats collector can accumulate the counters
>>>>>>>> per process type if necessary. If we adopt this approach, we can easily
>>>>>>>> extend pg_stat_wal so that any fields can be reported per process type.
>>>>>>>
>>>>>>> I'll remove the above source code because these counters are not useful.
>>>>>>>
>>>>>>>
>>>>>>> On 2020-10-30 12:00, Fujii Masao wrote:
>>>>>>>> On 2020/10/20 11:31, Masahiro Ikeda wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I think we need to add some statistics to pg_stat_wal view.
>>>>>>>>>
>>>>>>>>> Although there are some parameter related WAL,
>>>>>>>>> there are few statistics for tuning them.
>>>>>>>>>
>>>>>>>>> I think it's better to provide the following statistics.
>>>>>>>>> Please let me know your comments.
>>>>>>>>>
>>>>>>>>> ```
>>>>>>>>> postgres=# SELECT * from pg_stat_wal;
>>>>>>>>> -[ RECORD 1 ]-------+------------------------------
>>>>>>>>> wal_records | 2000224
>>>>>>>>> wal_fpi | 47
>>>>>>>>> wal_bytes | 248216337
>>>>>>>>> wal_buffers_full | 20954
>>>>>>>>> wal_init_file | 8
>>>>>>>>> wal_write_backend | 20960
>>>>>>>>> wal_write_walwriter | 46
>>>>>>>>> wal_write_time | 51
>>>>>>>>> wal_sync_backend | 7
>>>>>>>>> wal_sync_walwriter | 8
>>>>>>>>> wal_sync_time | 0
>>>>>>>>> stats_reset | 2020-10-20 11:04:51.307771+09
>>>>>>>>> ```
>>>>>>>>>
>>>>>>>>> 1. Basic statistics of WAL activity
>>>>>>>>>
>>>>>>>>> - wal_records: Total number of WAL records generated
>>>>>>>>> - wal_fpi: Total number of WAL full page images generated
>>>>>>>>> - wal_bytes: Total amount of WAL bytes generated
>>>>>>>>>
>>>>>>>>> To understand DB's performance, first, we will check the performance
>>>>>>>>> trends for the entire database instance.
>>>>>>>>> For example, if the number of wal_fpi becomes higher, users may tune
>>>>>>>>> "wal_compression", "checkpoint_timeout" and so on.
>>>>>>>>>
>>>>>>>>> Although users can check the above statistics via EXPLAIN, auto_explain,
>>>>>>>>> autovacuum and pg_stat_statements now,
>>>>>>>>> if users want to see the performance trends for the entire database,
>>>>>>>>> they must recalculate the statistics.
>>>>>>>>>
>>>>>>>>> I think it is useful to add the sum of the basic statistics.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2. WAL segment file creation
>>>>>>>>>
>>>>>>>>> - wal_init_file: Total number of WAL segment files created.
>>>>>>>>>
>>>>>>>>> To create a new WAL file may have an impact on the performance of
>>>>>>>>> a write-heavy workload generating lots of WAL. If this number is reported high,
>>>>>>>>> to reduce the number of this initialization, we can tune WAL-related parameters
>>>>>>>>> so that more "recycled" WAL files can be held.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 3. Number of when WAL is flushed
>>>>>>>>>
>>>>>>>>> - wal_write_backend : Total number of WAL data written to the disk by backends
>>>>>>>>> - wal_write_walwriter : Total number of WAL data written to the disk by walwriter
>>>>>>>>> - wal_sync_backend : Total number of WAL data synced to the disk by backends
>>>>>>>>> - wal_sync_walwriter : Total number of WAL data synced to the disk by walwrite
>>>>>>>>>
>>>>>>>>> I think it's useful for tuning "synchronous_commit" and "commit_delay" for query executions.
>>>>>>>>> If the number of WAL is flushed is high, users can know "synchronous_commit" is useful for the workload.
>>>>>>>>
>>>>>>>> I just wonder how useful these counters are. Even without these counters,
>>>>>>>> we already know synchronous_commit=off is likely to cause the better
>>>>>>>> performance (but has the risk of data loss). So ISTM that these counters are
>>>>>>>> not so useful when tuning synchronous_commit.
>>>>>>>
>>>>>>> Thanks, my understanding was wrong.
>>>>>>> I agreed that your comments.
>>>>>>>
>>>>>>> I merged the statistics of *_backend and *_walwriter.
>>>>>>> I think the sum of them is useful to calculate the average per write/sync time.
>>>>>>> For example, per write time is equals wal_write_time / wal_write.
>>>>>>
>>>>>> Understood.
>>>>>>
>>>>>> Thanks for updating the patch!
>>>>>
>>>>> Thanks for your comments.
>>>>>
>>>>>> patching file src/include/catalog/pg_proc.dat
>>>>>> Hunk #1 FAILED at 5491.
>>>>>> 1 out of 1 hunk FAILED -- saving rejects to file
>>>>>> src/include/catalog/pg_proc.dat.rej
>>>>>>
>>>>>> I got this failure when applying the patch. Could you update the patch?
>>>>>
>>>>> Thanks, I updated the patch.
>>>>>
>>>>>> - Number of times WAL data was written to the disk because WAL
>>>>>> buffers got full
>>>>>> + Total number of times WAL data written to the disk because WAL
>>>>>> buffers got full
>>>>>>
>>>>>> Isn't "was" necessary between "data" and "written"?
>>>>>
>>>>> Yes, I fixed it.
>>>>>
>>>>>> + <entry role="catalog_table_entry"><para role="column_definition">
>>>>>> + <structfield>wal_bytes</structfield> <type>bigint</type>
>>>>>>
>>>>>> Shouldn't the type of wal_bytes be numeric because the total number of
>>>>>> WAL bytes can exceed the range of bigint? I think that the type of
>>>>>> pg_stat_statements.wal_bytes is also numeric for the same reason.
>>>>>
>>>>> Thanks, I fixed it.
>>>>>
>>>>> Since I cast the type of wal_bytes from PgStat_Counter to uint64,
>>>>> I changed the type of PgStat_MsgWal and PgStat_WalStats too.
>>>>>
>>>>>> + <entry role="catalog_table_entry"><para role="column_definition">
>>>>>> + <structfield>wal_write_time</structfield> <type>bigint</type>
>>>>>>
>>>>>> Shouldn't the type of wal_xxx_time be double precision,
>>>>>> like pg_stat_database.blk_write_time?
>>>>>
>>>>> Thanks, I changed it.
>>>>>
>>>>>> Even when fsync is set to off or wal_sync_method is set to open_sync,
>>>>>> wal_sync is incremented. Isn't this behavior confusing?
>>>>
>>>> What do you think about this comment?
>>>
>>> Sorry, I'll change to increment wal_sync and wal_sync_time only
>>> if a specific fsync method is called.
>>>
>>>> I found that we discussed track-WAL-IO-timing feature at the past discussion
>>>> about the similar feature [1]. But the feature was droppped from the proposal
>>>> patch because there was the performance concern. So probably we need to
>>>> revisit the past discussion and benchmark the performance. Thought?
>>>>
>>>> If track-WAL-IO-timing feature may cause performance regression,
>>>> it might be an idea to extract wal_records, wal_fpi and wal_bytes parts
>>>> from the patch and commit it at first.
>>>>
>>>> [1]
>>>> https://postgr.es/m/CAJrrPGc6APFUGYNcPe4qcNxpL8gXKYv1KST+vwJcFtCSCEySnA@mail.gmail.com
>>>
>>> Thanks, I'll check the thread.
>>> I agree to add basic statistics at first and I attached the patch.
>>
>> Thanks!
>>
>> + /* Send WAL statistics */
>> + pgstat_send_wal();
>>
>> This is not necessary because walwriter generates no WAL data?
>
> No, it's not necessary.
> Thanks. I fixed it.
Thanks for updating the patch! I applied cosmetic changes to it.
For example, I added more comments. Patch attached.
Barring no objection, I will commit this patch.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachment | Content-Type | Size |
---|---|---|
0002_add_basic_statistics_to_pg_stat_wal_view_fujii.patch | text/plain | 8.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2020-12-01 05:38:26 | Re: scram-sha-256 broken with FIPS and OpenSSL 1.0.2 |
Previous Message | Kasahara Tatsuhito | 2020-12-01 04:48:41 | Re: autovac issue with large number of tables |