Re: About to add WAL write/fsync statistics to pg_stat_wal view

From: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Li Japin <japinli(at)hotmail(dot)com>, kuroda(dot)hayato(at)fujitsu(dot)com
Subject: Re: About to add WAL write/fsync statistics to pg_stat_wal view
Date: 2021-03-15 01:39:06
Message-ID: 90d8baf25655a08299185cb1a616bda1@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021-03-12 12:39, Fujii Masao wrote:
> On 2021/03/11 21:29, Masahiro Ikeda wrote:
>>>> In addition, I rebased the patch for WAL receiver.
>>>> (v17-0003-Makes-the-wal-receiver-report-WAL-statistics.patch)
>>>
>>> Thanks! Will review this later.
>>
>> Thanks a lot!
>
> I read through the 0003 patch. Here are some comments for that.
>
> With the patch, walreceiver's stats are counted as wal_write,
> wal_sync, wal_write_time and wal_sync_time in pg_stat_wal. But they
> should be counted as different columns because WAL IO is different
> between walreceiver and other processes like a backend? For example,
> open_sync or open_datasync is chosen as wal_sync_method, those other
> processes use O_DIRECT flag to open WAL files, but walreceiver does
> not. For example, those other procesess write WAL data in block units,
> but walreceiver does not. So I'm concerned that mixing different WAL
> IO stats in the same columns would confuse the users. Thought? I'd
> like to hear more opinions about how to expose walreceiver's stats to
> users.

Thanks, I understood get_sync_bit() checks the sync flags and
the write unit of generated wal data and replicated wal data is
different.
(It's interesting optimization whether to use kernel cache or not.)

OK. Although I agree to separate the stats for the walrecever,
I want to hear opinions from other people too. I didn't change the
patch.

Please feel to your comments.

> +int
> +XLogWriteFile(int fd, const void *buf, size_t nbyte, off_t offset)
>
> This common function writes WAL data and measures IO timing. IMO we
> can refactor the code furthermore by making this function handle the
> case where pg_write() reports an error. In other words, I think that
> the function should do what do-while loop block in XLogWrite() does.
> Thought?

OK. I agree.

I wonder to change the error check ways depending on who calls this
function?
Now, only the walreceiver checks (1)errno==0 and doesn't check
(2)errno==ENITR.
Other processes are the opposite.

IIUC, it's appropriate that every process checks (1)(2).
Please let me know my understanding is wrong.

> BTW, currently XLogWrite() increments IO timing even when pg_pwrite()
> reports an error. But this is useless. Probably IO timing should be
> incremented after the return code of pg_pwrite() is checked, instead?

Yes, I agree. I fixed it.
(v18-0003-Makes-the-wal-receiver-report-WAL-statistics.patch)

BTW, thanks for your comments in person that the bgwriter process will
generate wal data.
I checked that it generates the WAL to take a snapshot via
LogStandySnapshot().
I attached the patch for bgwriter to send the wal stats.
(v18-0005-send-stats-for-bgwriter-when-shutdown.patch)

This patch includes the following changes.

(1) introduce pgstat_send_bgwriter() the mechanism to send the stats
if PGSTAT_STAT_INTERVAL msec has passed like pgstat_send_wal()
to avoid overloading to stats collector because "bgwriter_delay"
can be set for 10msec or more.

(2) remove pgstat_report_wal() and integrate with pgstat_send_wal()
because bgwriter sends WalStats.m_wal_records and to avoid
overloading (see (1)).
IIUC, although the pros to separate them is to reduce the
calculation cost of
WalUsageAccumDiff(), the impact is limited.

(3) make a new signal handler for bgwriter to force sending remaining
stats during shutdown
because of (1) and remove HandleMainLoopInterrupts() because there
are no processes to use it.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Attachment Content-Type Size
v18-0003-Makes-the-wal-receiver-report-WAL-statistics.patch text/x-diff 6.7 KB
v18-0005-send-stats-for-bgwriter-when-shutdown.patch text/x-diff 10.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiro Ikeda 2021-03-15 01:54:01 Re: About to add WAL write/fsync statistics to pg_stat_wal view
Previous Message Thomas Munro 2021-03-15 01:25:38 Re: Collation versioning