Re: About to add WAL write/fsync statistics to pg_stat_wal view

From: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>
To: japin <japinli(at)hotmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: About to add WAL write/fsync statistics to pg_stat_wal view
Date: 2021-01-25 04:22:13
Message-ID: bf947e67a5cc596baae1debce26d68bf@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021-01-25 11:47, japin wrote:
> On Mon, 25 Jan 2021 at 09:36, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> wrote:
>> On Fri, Jan 22, 2021 at 10:05 PM Masahiro Ikeda
>> <ikedamsh(at)oss(dot)nttdata(dot)com> wrote:
>>>
>>> On 2021-01-22 14:50, Masahiko Sawada wrote:
>>> > On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda
>>> > <ikedamsh(at)oss(dot)nttdata(dot)com> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> I rebased the patch to the master branch.
>>> >
>>> > Thank you for working on this. I've read the latest patch. Here are
>>> > comments:
>>> >
>>> > ---
>>> > + if (track_wal_io_timing)
>>> > + {
>>> > + INSTR_TIME_SET_CURRENT(duration);
>>> > + INSTR_TIME_SUBTRACT(duration, start);
>>> > + WalStats.m_wal_write_time +=
>>> > INSTR_TIME_GET_MILLISEC(duration);
>>> > + }
>>> >
>>> > * I think it should add the time in micro sec.
>>> > After running pgbench with track_wal_io_timing = on for 30 sec,
>>> > pg_stat_wal showed the following on my environment:
>>> >
>>> > postgres(1:61569)=# select * from pg_stat_wal;
>>> > -[ RECORD 1 ]----+-----------------------------
>>> > wal_records | 285947
>>> > wal_fpi | 53285
>>> > wal_bytes | 442008213
>>> > wal_buffers_full | 0
>>> > wal_write | 25516
>>> > wal_write_time | 0
>>> > wal_sync | 25437
>>> > wal_sync_time | 14490
>>> > stats_reset | 2021-01-22 10:56:13.29464+09
>>> >
>>> > Since writes can complete less than a millisecond, wal_write_time
>>> > didn't increase. I think sync_time could also have the same problem.
>>>
>>> Thanks for your comments. I didn't notice that.
>>> I changed the unit from milliseconds to microseconds.
>>>
>>> > ---
>>> > + /*
>>> > + * Measure i/o timing to fsync WAL data.
>>> > + *
>>> > + * The wal receiver skip to collect it to avoid performance
>>> > degradation of standy servers.
>>> > + * If sync_method doesn't have its fsync method, to skip too.
>>> > + */
>>> > + if (!AmWalReceiverProcess() && track_wal_io_timing &&
>>> > fsyncMethodCalled())
>>> > + INSTR_TIME_SET_CURRENT(start);
>>> >
>>> > * Why does only the wal receiver skip it even if track_wal_io_timinig
>>> > is true? I think the performance degradation is also true for backend
>>> > processes. If there is another reason for that, I think it's better to
>>> > mention in both the doc and comment.
>>> > * How about checking track_wal_io_timing first?
>>> > * s/standy/standby/
>>>
>>> I fixed it.
>>> As kuroda-san mentioned too, the skip is no need to be considered.
>>
>> I think you also removed the code to have the wal receiver report the
>> stats. So with the latest patch, the wal receiver tracks those
>> statistics but doesn't report.
>>
>> And maybe XLogWalRcvWrite() also needs to track I/O?
>>
>>>
>>> > ---
>>> > + /* increment the i/o timing and the number of times to fsync WAL
>>> > data */
>>> > + if (fsyncMethodCalled())
>>> > + {
>>> > + if (!AmWalReceiverProcess() && track_wal_io_timing)
>>> > + {
>>> > + INSTR_TIME_SET_CURRENT(duration);
>>> > + INSTR_TIME_SUBTRACT(duration, start);
>>> > + WalStats.m_wal_sync_time +=
>>> > INSTR_TIME_GET_MILLISEC(duration);
>>> > + }
>>> > +
>>> > + WalStats.m_wal_sync++;
>>> > + }
>>> >
>>> > * I'd avoid always calling fsyncMethodCalled() in this path. How about
>>> > incrementing m_wal_sync after each sync operation?
>>>
>>> I think if syncing the disk does not occur, m_wal_sync should not be
>>> incremented.
>>> It depends enableFsync and sync_method.
>>>
>>> enableFsync is checked in each fsync method like
>>> pg_fsync_no_writethrough(),
>>> so if incrementing m_wal_sync after each sync operation, it should be
>>> implemented
>>> in each fsync method. It leads to many duplicated codes.
>>
>> Right. I missed that each fsync function checks enableFsync.
>>
>>> So, why don't you change the function to a flag whether to
>>> sync data to the disk will be occurred or not in issue_xlog_fsync()?
>>
>> Looks better. Since we don't necessarily need to increment m_wal_sync
>> after doing fsync we can write the code without an additional variable
>> as follows:
>>
>> if (enableFsync)
>> {
>> switch (sync_method)
>> {
>> case SYNC_METHOD_FSYNC:
>> #ifdef HAVE_FSYNC_WRITETHROUGH
>> case SYNC_METHOD_FSYNC_WRITETHROUGH:
>> #endif
>> #ifdef HAVE_FDATASYNC
>> case SYNC_METHOD_FDATASYNC:
>> #endif
>> WalStats.m_wal_sync++;
>> if (track_wal_io_timing)
>> INSTR_TIME_SET_CURRENT(start);
>> break;
>> default:
>> break;
>> }
>> }
>>
>> (do fsync and error handling here)
>>
>> /* increment the i/o timing and the number of times to fsync WAL
>> data */
>> if (track_wal_io_timing)
>> {
>> INSTR_TIME_SET_CURRENT(duration);
>> INSTR_TIME_SUBTRACT(duration, start);
>> WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration);
>> }
>>
>> I think we can change the first switch-case to an if statement.
>>
>
> +1. We can also narrow the scope of "duration" into "if
> (track_wal_io_timing)" branch.

Thanks, I'll change it.

--
Masahiro Ikeda
NTT DATA CORPORATION

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2021-01-25 04:23:40 Re: Is Recovery actually paused?
Previous Message tsunakawa.takay@fujitsu.com 2021-01-25 04:21:39 RE: POC: postgres_fdw insert batching