Re: About to add WAL write/fsync statistics to pg_stat_wal view

From: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Li Japin <japinli(at)hotmail(dot)com>, kuroda(dot)hayato(at)fujitsu(dot)com
Subject: Re: About to add WAL write/fsync statistics to pg_stat_wal view
Date: 2021-01-26 06:56:22
Message-ID: 395f109540835fabe76dbd9d45733a14@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, David.

Thanks for your comments.

On 2021-01-26 08:48, David G. Johnston wrote:
> On Mon, Jan 25, 2021 at 8:03 AM Masahiko Sawada
> <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
>> On Mon, Jan 25, 2021 at 4:51 PM Masahiro Ikeda
>> <ikedamsh(at)oss(dot)nttdata(dot)com> wrote:
>>>
>>> Hi, thanks for the reviews.
>>>
>>> I updated the attached patch.
>>
>> Thank you for updating the patch!
>
> Your original email with "total number of times" is more correct,
> removing the "of times" and just writing "total number of WAL" is not
> good wording.
>
> Specifically, this change is strictly worse than the original.
>
> - Number of times WAL data was written to disk because WAL
> buffers became full
> + Total number of WAL data written to disk because WAL buffers
> became full
>
> Both have the flaw that they leave implied exactly what it means to
> "write WAL to disk". It is also unclear whether a counter, bytes, or
> both, would be more useful here. I've incorporated this into my
> documentation suggestions below:
> (wal_buffers_full)
>
> -- Revert - the original was better, though maybe add more detail
> similar to the below. I didn't research exactly how this works.

OK, I understood.
I reverted since this is a counter statistics.

> (wal_write)
> The number of times WAL buffers were written out to disk via XLogWrite
>

Thanks.

I thought it's better to omit "The" and "XLogWrite" because other views'
description
omits "The" and there is no description of "XlogWrite" in the documents.
What do you think?

> -- Seems like this should have a bytes version too

Do you mean that we need to separate statistics for wal write?

> (wal_write_time)
> The amount of time spent writing WAL buffers to disk, excluding sync
> time unless the wal_sync_method is either open_datasync or open_sync.
> Units are in milliseconds with microsecond resolution. This is zero
> when track_wal_io_timing is disabled.

Thanks, I'll fix it.

> (wal_sync)
> The number of times WAL files were synced to disk while
> wal_sync_method was set to one of the "sync at commit" options (i.e.,
> fdatasync, fsync, or fsync_writethrough).

Thanks, I'll fix it.

> -- it is not going to be zero just because those settings are
> presently disabled as they could have been enabled at some point since
> the last time these statistics were reset.

Right, your description is correct.
The "track_wal_io_timing" has the same limitation, doesn't it?

> (wal_sync_time)
> The amount of time spent syncing WAL files to disk, in milliseconds
> with microsecond resolution. This requires setting wal_sync_method to
> one of the "sync at commit" options (i.e., fdatasync, fsync, or
> fsync_writethrough).

Thanks, I'll fix it.
I will add the comments related to "track_wal_io_timing".

> Also,
>
> I would suggest extracting the changes to postmaster/pgstat.c and
> replication/walreceiver.c to a separate patch as you've fundamentally
> changed how it behaves with regards to that function and how it
> interacts with the WAL receiver. That seems an entirely separate
> topic warranting its own patch and discussion.

OK, I will separate two patches.

On 2021-01-26 08:52, David G. Johnston wrote:
> On Mon, Jan 25, 2021 at 4:37 PM Masahiro Ikeda
> <ikedamsh(at)oss(dot)nttdata(dot)com> wrote:
>
>> I agree with your comments. I think it should report when
>> reaching the end of WAL too. I add the code to report the stats
>> when finishing the current WAL segment file when timeout in the
>> main loop and when reaching the end of WAL.
>
> The following is not an improvement:
>
> - /* Send WAL statistics to the stats collector. */+ /* Send WAL
> statistics to stats collector */
>
> The word "the" there makes it proper English. Your copy-pasting
> should have kept the existing good wording in the other locations
> rather than replace the existing location with the newly added
> incorrect wording.

Thanks, I'll fix it.

> This doesn't make sense:
>
> * current WAL segment file to avoid loading stats collector.
>
> Maybe "overloading" or "overwhelming"?
>
> I see you removed the pgstat_send_wal(force) change. The rest of my
> comments on the v6 patch still stand I believe.

Yes, "overloading" is right. Thanks.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-01-26 06:58:32 Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
Previous Message David Rowley 2021-01-26 06:55:20 Re: Tid scan improvements