Re: About to add WAL write/fsync statistics to pg_stat_wal view

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Li Japin <japinli(at)hotmail(dot)com>
Subject: Re: About to add WAL write/fsync statistics to pg_stat_wal view
Date: 2021-01-25 01:36:31
Message-ID: CAD21AoC+Z7-OAwr0RES+WgAvMmi3Uv-bh6u4W1cfS_=Z0hddqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 22, 2021 at 10:05 PM Masahiro Ikeda
<ikedamsh(at)oss(dot)nttdata(dot)com> wrote:
>
> On 2021-01-22 14:50, Masahiko Sawada wrote:
> > On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda
> > <ikedamsh(at)oss(dot)nttdata(dot)com> wrote:
> >>
> >> Hi,
> >>
> >> I rebased the patch to the master branch.
> >
> > Thank you for working on this. I've read the latest patch. Here are
> > comments:
> >
> > ---
> > + if (track_wal_io_timing)
> > + {
> > + INSTR_TIME_SET_CURRENT(duration);
> > + INSTR_TIME_SUBTRACT(duration, start);
> > + WalStats.m_wal_write_time +=
> > INSTR_TIME_GET_MILLISEC(duration);
> > + }
> >
> > * I think it should add the time in micro sec.
> > After running pgbench with track_wal_io_timing = on for 30 sec,
> > pg_stat_wal showed the following on my environment:
> >
> > postgres(1:61569)=# select * from pg_stat_wal;
> > -[ RECORD 1 ]----+-----------------------------
> > wal_records | 285947
> > wal_fpi | 53285
> > wal_bytes | 442008213
> > wal_buffers_full | 0
> > wal_write | 25516
> > wal_write_time | 0
> > wal_sync | 25437
> > wal_sync_time | 14490
> > stats_reset | 2021-01-22 10:56:13.29464+09
> >
> > Since writes can complete less than a millisecond, wal_write_time
> > didn't increase. I think sync_time could also have the same problem.
>
> Thanks for your comments. I didn't notice that.
> I changed the unit from milliseconds to microseconds.
>
> > ---
> > + /*
> > + * Measure i/o timing to fsync WAL data.
> > + *
> > + * The wal receiver skip to collect it to avoid performance
> > degradation of standy servers.
> > + * If sync_method doesn't have its fsync method, to skip too.
> > + */
> > + if (!AmWalReceiverProcess() && track_wal_io_timing &&
> > fsyncMethodCalled())
> > + INSTR_TIME_SET_CURRENT(start);
> >
> > * Why does only the wal receiver skip it even if track_wal_io_timinig
> > is true? I think the performance degradation is also true for backend
> > processes. If there is another reason for that, I think it's better to
> > mention in both the doc and comment.
> > * How about checking track_wal_io_timing first?
> > * s/standy/standby/
>
> I fixed it.
> As kuroda-san mentioned too, the skip is no need to be considered.

I think you also removed the code to have the wal receiver report the
stats. So with the latest patch, the wal receiver tracks those
statistics but doesn't report.

And maybe XLogWalRcvWrite() also needs to track I/O?

>
> > ---
> > + /* increment the i/o timing and the number of times to fsync WAL
> > data */
> > + if (fsyncMethodCalled())
> > + {
> > + if (!AmWalReceiverProcess() && track_wal_io_timing)
> > + {
> > + INSTR_TIME_SET_CURRENT(duration);
> > + INSTR_TIME_SUBTRACT(duration, start);
> > + WalStats.m_wal_sync_time +=
> > INSTR_TIME_GET_MILLISEC(duration);
> > + }
> > +
> > + WalStats.m_wal_sync++;
> > + }
> >
> > * I'd avoid always calling fsyncMethodCalled() in this path. How about
> > incrementing m_wal_sync after each sync operation?
>
> I think if syncing the disk does not occur, m_wal_sync should not be
> incremented.
> It depends enableFsync and sync_method.
>
> enableFsync is checked in each fsync method like
> pg_fsync_no_writethrough(),
> so if incrementing m_wal_sync after each sync operation, it should be
> implemented
> in each fsync method. It leads to many duplicated codes.

Right. I missed that each fsync function checks enableFsync.

> So, why don't you change the function to a flag whether to
> sync data to the disk will be occurred or not in issue_xlog_fsync()?

Looks better. Since we don't necessarily need to increment m_wal_sync
after doing fsync we can write the code without an additional variable
as follows:

if (enableFsync)
{
switch (sync_method)
{
case SYNC_METHOD_FSYNC:
#ifdef HAVE_FSYNC_WRITETHROUGH
case SYNC_METHOD_FSYNC_WRITETHROUGH:
#endif
#ifdef HAVE_FDATASYNC
case SYNC_METHOD_FDATASYNC:
#endif
WalStats.m_wal_sync++;
if (track_wal_io_timing)
INSTR_TIME_SET_CURRENT(start);
break;
default:
break;
}
}

(do fsync and error handling here)

/* increment the i/o timing and the number of times to fsync WAL data */
if (track_wal_io_timing)
{
INSTR_TIME_SET_CURRENT(duration);
INSTR_TIME_SUBTRACT(duration, start);
WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration);
}

I think we can change the first switch-case to an if statement.

>
>
> > * As far as I read the code, issue_xlog_fsync() seems to do fsync even
> > if enableFsync is false. Why does the function return false in that
> > case? I might be missing something.
>
> IIUC, the reason is that I thought that each fsync functions like
> pg_fsync_no_writethrough() check enableFsync.
>
> If this code doesn't check, m_wal_sync_time may be incremented
> even though some sync methods like SYNC_METHOD_OPEN don't call to sync
> some data to the disk at the time.

Right.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2021-01-25 02:21:12 Re: Extend more usecase for planning time partition pruning and init partition pruning.
Previous Message kuroda.hayato@fujitsu.com 2021-01-25 01:34:53 RE: About to add WAL write/fsync statistics to pg_stat_wal view